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ABSTRACT 



The development of large-scale integrated circuits in 
the last few years has resulted in a rapid increase in the 
number of digital devices available for electronic system 
design. Skilled system designers often do not have an abun- 
dance of software experience and require better tools than 
are presently availaole in order to take maximum advantage 
of microprocessors and other functional building blocks. A 
case is made for the development of a high-level language 
compiler which will allow the designer to specify not only 
his algorithm but also his hardware configuration and his 
optimization constraints. The PL/M compiler developed by 
Intel Corporation is used as a model for examining Some of 
the requirements of this "mac h i ne- i ndependen t " compiler. A 
summary of work which was done to implement the first stages 
of such a compiler is presentedf and factors which must be 
considered in order to further this work are discussed. 
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I. INTRODUCTION 



A. PROBLEM DEFINITION 

The most promising and widely discussed device in the 
electronics industry during the last few years has been the 
microprocessor. Tnis device packages the central processing 
unit and associated elements of a digital computer into a 
handful of integrated circuit chips; in many cases only one 
chip is used. Since the advent of the microprocessor in 
1971 it has become much easier to incorporate the power of a 
digital computer into the design of an electronic system. 
Compared with custom Large Scale Integration (LSI) circuits 
microprocessors are convenient/ flexible/ low-cost devices 
which have allowed sophisticated features to be made avail- 
able in relatively simple systems. As a result their use 
has expanded rapidly/ and many people who have had limited 
programming experience are now being forced to write pro- 
grams as part of their design efforts. 

The term "firmware" has come to be used for systems 
which utilize programmable digital components/ since the 
development of such systems requires both hardware and 
software design. The design of a firmware system/ whether 
it uses a microprocessor or some other means of providing a 
programming caoaoility/ is a complex task requiring the best 
skills of both the electronics engineer and the computer 
programmer. 
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Another development which> although conceived in the 
early 19S0's/ has become significant only in the last decade 
is the use of microprogramming in digital systems. Mi- 
croprogramming differs from "normal" programming primarily 
in the level of detail considered. Each instruction in the 
instruction set of a typical digital computer regui res 
several hardware operations to be performed/ but these 
operations are transparent to the programmer. In micropro- 
grammable systems each of these primitive operations may be 
invoked by a microinstruction. Initially/ as in the IBM 
System/360/ microprogramming was done only by the manufac- 
turer/ but today there are general purpose computers Ce.g./ 
the Hewlett-Packard HP-3100 and Burroughs "D" machine) which 
allow user microprogramming. In addition there are propo- 
sals for using standard functional modules in the implemen- 
tation of special purpose digital systems [563. These modu- 
lar systems will be controlled by what amount to micropro- 
grams. 

As in the case of the m i c r oo roc esso r / microprogrammed 
systems will in most instances be programmed by engineers 
who have a firm background in hardware design but who may 
have minimal software experience. Thus it is becoming in- 
creasingly necessary that programming languages be developed 
which are easy to use and which can produce good control 
code for a variety of architectures. The CPmpiler for such 
a language could he considered a software computer aided 
design (CAD) tool for the engineer. Ideally/ it would 



accept a description of the algorithm to be performed (the 
orogram) and descriotions of the hardware and the format of 
the control code? the output would then be a control program 
to perform the algorithm. 

Succeeding chaoters of this tliesis examine some of the 
considerations necessary in the development of a programming 
language for firmware system design. Chapter Il-contains a 
discussion of orogramming languages and the advantages and 
disadvantages of high-level languages. The language PL/M is 
presented in Chapter III and is used as a basis for examin- 
ing the necessary features of a high-level language. The 
implementation of pass 1 of a PL/M compiler is described in 
Chapter IV. This chapter also describes some of the 
theoretical aspects of programming language design and im- 
plementation. The output of pass 1 is an intermediate 
language representation of a source language program^ an 
important concept which is discussed in Chapter V. 

A major factor in the implementation of any digital sys- 
tem is the system architecture. Chapter VI contains 
descriptions of various types of architectures and a discus- 
sion of the influence of architecture on language design. 
Optimization of the outout code is another imoortant con- 
sideration in the design of a compiler. Many firmware sys- 
tems will be produced in large numbers# and the amount of 
hardware used will have a significant impact on the cost, 
because of the fierce competition among manufacturers# good 
optimization technigues will be critical in the 
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imolementation of these systems. Chapter VII is devoted to 
the tonic of compiler optimization. 

The topics discussed in Chapters IV-VII are tied togeth- 
er in Chanter Vlllf which shows how they all influence the 
design of a compiler for user-definable architectures. 
Chapter IX summarizes the conclusions reached during the 
study of the problem and presents a list of recommendations 
for future work. 

B. SOFTWARE ENGINEERING 

With the rapidly growing use of digital techniques in 
electronic system design has come the emergence of a new 
discipline^ that of software engineering, to address prob- 
lems at the hardware-sof tware interface. Although digital 
computers have been in existence for more than 30 years, it 
is only today becoming widely recognized that the software 
design considerations are at least as important as the 
hardware design considerations in digital system development 
lllJ. Tne acceptance of the fact that software problems are 
of more than academic interest is highlighted by the recent- 
ly inaugurated publication of a new technical journa1--the 
IEEE Transactions on Software Engineering. Because software 
engineering addresses many issues which are very closely 
related to the firmware design problem, its goals and 
pr i nc i p 1 es--as defined by Ross, Goodenough, and Irvine 
(51J--are outlined below. These ideas have a strong influ- 
ence on much of the remainder of this thesis. 
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The pools of software en,gineering are: 

1) Modifiabi1ity”“This refers to the ability to make con- 
trolled changes in a program. In a large system 
software modifications nave to be made during develop- 
ment as well as after production has begun. Modifica- 
tions may be made either to correct errors or to change 
or add features and provide varying levels of perfor- 
mance (i,e.» a "family" of systems). 

2) Efficiency-~This goal is concerned with the best utili- 
zation of the resources available. Typically this 
means using the least memory and time in performing the 
task. Efficiency is usually "... prematurely permittee 
a high priority in engineering tradeoffs ... (but) does 
not dominate the practice of software engineering." 
(51, p. 20-21] 

3) Re 1 i ab i 1 i t y--T h i s is a critical goal, especially for 
software systems used in real-time control applica- 
tions. Unfortunately reliability has too often in the 
past been considered as secondary to efficiency in 
software development. 

Understandabi 1 i t y--Th i s goal supports the goals of 
modifiability and reliability. If a piece of software 
is easily understandabi e, it is easy to modify and easy 
to check for reliability. It is unfortunate that un- 
de rs t andab i 1 i t y is usually considered to reduce effi- 
ciency, but this relationship does not necessarily 
hold. Increased understand ability can lead to the 
detection of inefficiencies in a large system. 
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There are seven principles of software engineering which 
may be applied in order to achieve the goals. These princi- 
pi es are: 

1) Modu 1 a r i t y- Th i s refers to the purposeful structuring 
of a system. Modularity is an important principle in 
both hardware and software design. 

2 ) Abs t rac t i on--T he unessential details are omitted at any 
given level in the design# leaving only abstract con- 
cepts for consideration. 

3) Loca 1 i zat i on--Li mi t i ng the scope of a structure or a 
concept is closely related to modularity. Localization 
enhances confirmability and unde r st andab i 1 i t y . 
Hiding--"... lT]he purpose of hiding is to make inac~ 
cesible certain details that should not affect other 
parts of a system." (51# p.22] 

5) Un i f o r m i t y-” I t is important that definitions and con- 
cepts be applied uniformly across a system. 

6) Comp 1 e t enes s--Soec i f y i ng all details and leaving noth- 
ing to chance greatly increases reliability. 

7) Confirm a bility~"This refers to the ability to determine 
whether all the design goals have been met. 

Software engineering is concerned with th^ question of 
whether it is more important to have very efficient code# in 
the sense that it uses the minimum amount of memory and exe- 
cutes at the maximum rate (two goals which# oy the way# are 
usually not compatible)# or whether it is more important to 
have code which is reliable# easy to modify# and takes a 



minimum amount of time to develop 



As in all engineering 



disciplines^ software 
tradeoffs among the 
answer will be correct 



engineering is involved with 
various alternatives/ since 
in all situations. 
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programming languages 



II. 



In the early days of digital computer use it became evi- 
dent that an alternative to machine language programming was 
needed. A computer program is really nothing more than a 
series of binary digits contained in some storage meaiumf 
Put human engineering dictated that early machine language 
programs be represented as groups of octal digits. With the 
introduction of mnemonics and assemblers to translate them^ 
programs oecame almost readable. Assemblers became more and 
more sophisticated with the addition of macros# comment 
fields# and conditional assembly features# but programs 
still were tedious to write and difficult to read. The 
drawback of assembly language programs is that they contain 
too much information about the operation of the hardware 
(contrary to the principle of abstraction)# and this tends 
to obscure information related to the algorithm being imple- 
mented. Since there is essentially a one-to-one correspon- 
dence between assembly language instructions and machine 
instructions# assembly language programs still tend to be 
very cumbersome and error-prone except when used for very 
simple problems. Thus# as programs became increasingly com- 
plex# high-level languages were introduced. 

A. IHE CASE FOR HIGH-LEVEL LANGUAGES 

The development of high-level languages was spurred by 
the desire to be able to write programs which are more 
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descriptive of the problems being solved and which depend 
less on the actual hardware on which the programs are to 
execute., "High-order languages represent a concept for 
improving the underst andabi 1 i t y of programs by abstracting 
from the details of computer instruction sets," (51f p.l9] 
These languages are designed to facilitate description of 
the proceoural steps involved in problem solution^ and thus 
they are often referred to as p rocedu re-o r i en t ed languages 
(as opposed to machine-oriented assemoly languages). 

The main advantages of programming in a high-level 
language aret 

1) The programmer is freed from the consideration of many 
minor details. These details are mainly in the nature 
of bookkeepi ng--memory allocation/ register allocation/ 
assignment of temporary variables to hold the results 
of partial computations/ remembering branch locations/ 
type checking of variables/ and many others. This fac- 
tor is becoming even more important with the increased 
use of m i c roprogr ammab 1 e systems. "The inability of a 
user to cope with a highly intricate/ time-and-machine 
dependent environment often results in inefficient/ if 
not error prone/ microprograms." (^7/ p.791J 

2) Efficient control structures greatly reduce the burden 
of programming/ resulting in increased reliability. 

3) Symbolic user variables increase the readability of the 
program. This is also one of the advantages assembly 
languages have over machine languages. 
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The ability to write arbitrary arithmetic expressions 
also increases readability and tends to reduce computa- 
tional errors. 

5) Programmer productivity is increased because of the 
expansion factor involved in the translation from 
high-level language to machine language. It is gen- 
erally recognized that programmers produce^ on average 
in a large projects only a few lines of code per day/ 
whether it be machine code/ assembly code/ or high- 
level language code. 

6) Documentation is improved/ because the program is more 
understandable. A good high-level language encourages 
the writing of programs which are essentially self- 
document i ng. 

7) Maintenance/ modification/ and debugging are facilitat- 
ed because of the improved readability and documenta- 
tion. 

8) Transportability is improved/ because a high-level 
language has little dependence on a particular machine 
architecture. In fact one goal of language design is 
complete machine independence. This topic is covered 
more fully in Chapter VIII. 

By far the most popular criticism of high-level 
languages is based upon the concern for efficiency. There 
are basically two sources of inefficiency. The first has to 
do with the fact that in certain instances some languages 
are too machine independent in that they do not recognize 
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features which are basic to computer hardware. For example/ 
FORTKAf^j does not contain primitive operations for bit manip- 
ulation (shifting/ rotating/ masking/ etc.). A good high- 
level language should not restrict the programmer from doing 
anything that he could do at the assembly or machine 
1 anguage levels. 

The second source of inefficiency lies within the code 
generation orocess and is really a characteristic of the 
compiler rather than the language. The complaints most 
often voiced by those opposed to the use of a high-level 
language are that the compiler generates too much code and 
that the code generated is wasteful of time. However/ the 
point is usually demonstrated with only a small program 
( 19 /^ 43 ) . 

These inefficiencies are really local in nature/ since 
each instance can usually be isolated to a few lines of 
code. A gooa assembly language programmer can write locally 
"optimal" code/ but in a large program his code will suffer 
from global inefficiencies (see advantage (1) above). Thus 
"... data based upon comparisons between small programs will 
tend to underestimate the advantage of the higher level 
language for large programs." 123/ p.21^) Many large pro- 
grams written in high-level languages would have been very 
difficult and costly to write in assembly language (3^1 and 
probably would have been less efficient. 

It is doubtful whether any compiler will ever be able to 
generate completely locally optimal code (as compared with 
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assembly language versions)/ but there are many promising 
techniques emerging (see Chapter VII). Experience has shown 
that global inefficiency is a nonlinear function of program 
length/ and a good compiler can usually produce more effi- 
cient code than an assembly language programmer for programs 
longer than about 50 to 100 high-level language statements 
[^3] . For shorter programs an engineering decision must be 
made as to whether the advantages of Programming in a high- 
level language outweigh the loss in efficiency. A more com- 
plete discussion of this topic is presented in Section 
VII. A, As memory costs continue to fall/ the extra code 
generated by the local inefficiencies in a compiler will 
take on lesser significance even in small system design pro- 
jects. 

B. SYSTEM PROGRAMMING LANGUAGES 

An area very closely related to firmware design is that 
of system programming. For many years system progr amme r s 
have avoided the use of high-level languages/ and for the 
same reason that the Designers of programmable hardware 
(firmware) are now avoiding t hem-- i ne f f i enc y . In addition 
to the fact that software engineering considerations are 
causing this position to be reevaluated/ many advances have 
been made in the past few years in the area of programming 
language design. The development of good/ machine- 
independent/ high-level languages for system programming has 
been studied for several years Ii^l/^0)/ and a few languages 
have been implemented. 
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The UfJlX operating system (50J ^ designed for the popular 
PDP-11 series of m i n i c omou t e r s / is currently in use at more 
than 50 .installations around the country (including the Com- 
puter Science Laboratory at the Naval Postgraduate School) 
ano was written almost entirely in the C language (^19J / an 
Alqol-like high-level language. In fact/ the C compiler 
itself was written in C, The fact that an interactive^ 
multi-user operating system as sophisticated as UNIX can be 
implemented satisfactorily on a minicomputer confirms the 
viability of high-level language programming in a situation 
requiring efficient machine code. 

C. COMPOSITE LANGUAGES 

One alternative solution to the problem of choosing 
between a high-level language and an assembly language is 
the composite language--a language which has (hopefully) the 
best features of both. The simplest implementation is a 
high-level language which allows assembly code to be insert- 
ed into a program, PL/360 is an example of this type of 
language. The advantage of using such a language is claimed 
to lie ir» in the ability to make use of the efficiency of 
the assembly language while retaining the benefits of high- 
level language programming. 

Aside from the loss of unde rst andab i 1 i t y there are two 
major disadvantages in using this approach. First is the 
loss of machine independence/ which reduces transportabili- 
ty. Each time the architecture of the hardware is changed 
(e.g./ by using a different processor or by rearranging a 
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modular system)f the program must be carefully examined for 
instructions which need to be changed. The second disadvan- 
tage is the reduction in reliability brought about by the 
fact that the programmer is allowed access to facilities 
which normally are completely controlled by the compiler. 
This can lead to conflicts (e.g.» in resource allocation) 
and may cause unexpected results and subtle side effects 
which are difficult to trace. 

These disadvantages were partially overcome in the im- 
plementation of the Language for Systems Development (LSD) 
(yOJ . In LSD the use of assembly language is restricted to 
macros whose definitions are separate from the program it- 
self. Except for the fact that the notation involved seems 
somewhat clumsy^ this approach orobaoly comes very close to 
the ideal notion of a mach i ne- i ndependen t compiler. 

A slightly different approach was taken by Popper (A3) 
in the implementation of SMAL> which is in essence an assem- 
bly language with some of the structure of a high-level 
language. A SMAL program equivalent to the example present- 
ed in Section VII. A was written by Popper^ and it required 
only four more bytes of memory than the assembly language 
version. Although the SMAL version is easier to read than 
the assembly language version^ it is more difficult to read 
than the PL/M version. 

Composite approaches such as Popper's probably will be 
very beneficial for programmers who are designing small 
microprocessor systems but they do not appear to provide the 
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Dest long-range solution to the firmware design problem. 
Succeeding sections will make it evident that compiler 
theory is advancing to the point of favoring the development 
of high-level languages which do not allow such highly 
machine-dependent features as are found in composite 
languages. 
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THE PL/M language 



In the effort to provide comprehensive software support 
for its eight-bit microprocessors^ Intel Corporation was led 
naturally in 197i to the development of the high-level 
language PL/M 129^331. Since then several other micropro- 
cessor manufacturers have announced either the availability 
or the anticipated availability of PL/M compilers (with pos- 
sibly some slight modifications) for their microprocessors. 
The first large scale application of the language by InteW 
ironically^ was in the development of a sophisticated 
macro-assembler to run on its Intellec 8 microcomputer 
developmental system. 

PL/M is derived from the XPL comp i I er-wr i t i ng language 
f which in turn is a derivative of PL/I. Thus PL/M is 
very closely related to both of these languages in its syn- 
tax and semantics. A complete list of the syntactic produc- 
tions is given in the Intel reference manual [29)» and the 
syntax and semantics of the C language version used for this 
investigation is given in Appendix B (see file "m.gram"). 
It should be noted that the syntax for the C version is not 
written in the standard BNF notation but rather in the nota- 
tion required by YACC (see Section IV.B.l). 

There have been many proposals over the years for the 
development of mac h i ne- i noependen t programming languages 
(e.g.» MPL 1181/ which is also similar to XPL). PL/M, 
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although not currently mac h i ne" i ndependent » has the advan- 
tage of having been implemented and used for practical sys- 
tem develooment. Thus PL/M was chosen as the vehicle for 
examining some of the considerations in the development of a 
machine-indeoendent high-level language for firmware system 
design. The remainder of this section is devoted to a brief 
description of the language and a discussion of its advan- 
tages and oossible shortcomings, 

A, LANGUAGE FEATURES 

Lloyd and Van Dam have defined a high-level language to 
t)e one which has the following features; 

(1) Symbolic user variables (allocated by the compiler); 

(2) Ability to evaluate arbitrary arithmetic or logical 
exoressions; 

(3) Flow of control statements beyond simple (condition- 
al and unconditional) GOTO; SKIP; Branch and Link, 
138; p,537J 

In his search for a high-level programming language; 
Eckhouse found the need for one that was *',,, procedural; 
descriptive; flexible; and possibly machine“independent," 
117; p,169], PL/M has all of these features; including a 

limited kind of mac h i ne- i ndependenc e , The latter feature is 
exhibited in the ability of PL/M programs to be compiled for 
either the 6008 or the 8080 microprocessor, Altliough these 
two devices are both manufactured by Intel and have somewhat 
similar instruction sets; they have different architectures 
and a significant difference in the flexibility and speed of 
execution of their instructions. These points will be ex- 
plored further in Chapters VII and VIII, 
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As is its predecesso rs f 



PL/M is a b 1 PC k-s t rue t u rea 



language with a comprehensive set of control structures. 
"... ITlhe control structures of sequential flow» condition- 
al selection/ and iteration are sufficient to implement any 
algorithm." 160/ p.35] Sequential flow is provided by the 
simple statement and the DO-END group/ while conditional 
selection is accomolished by three constructs; IF-THEN and 
IF-THEN-ELSE statements and DO CASE groups. The DO FOR 
group is used for a fixed number of iterations/ and the 00 
rVHiLE group is used for iterating until some condition is 
satisfied. The GOlO statement is also provided, in PL/M for 
use in those rare circumstances where the use of the other 
control structures may be somewhat awkward. In recent years 
cons i Derations of software engineering have discouraged 
indiscriminate use of the GOTO since "... goto-free program- 
ming forces programmers to make explicit the conditions 
under which a given statement is executed/ and this can help 
ensure understandability and prevent errors." [51/ p.21) 

PL/M is relatively easy to learn and read and has a sim- 
ple character set. This latter factor may be important/ 
since a language intended for use in a wide variety of 
design environments should not require special character 
sets such as those of APL or some of the proposed micropro- 
gramming languages (e.g./ see [^17]). Ease of learning and 
readability are imoortant in increasing programmer produc- 
tivity and program modifiability and reliability. 

In order to give a more complete picture of the features 
of PL/M/ a sample program [291 is presented in Figure 1. 
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1. 20m a; /* is the origin of this program */ 

2. declare tto literally *2'/ cr literally '15q'/ 

3. 1 f 1 i teral ly 'Oah' , 

U. true literally M'f false literally 'O' 

5 . 

6 . squa reroot : orocedure(x) byte; 

7. declare (x^y,z) address; 

8. y =x; z = shr(x+l/l); 

9. dowhileyOz; 

10. y = z; z = shr(x/y + y + 1/ 1); 

11. end; 

12. return y; 

13. end squareroot; 

la. 

lb. prints char: orocedure(char); 

16. declare bitScell literally '91'# 

17. (char#i) byte; 

18. out out (tto) = o; 

19 . call time (bitScell); 

20. doi=0to7; 

21. output(tto) = char; /* data pulses */ 

22. char = ror(char#l); 

23. call t i me (b i t See 1 1 ) ; 

29. end; 

25. outDut(tto) = l; 

26. call time CbitScell + bitScell); 

27. /* automatic return is generated */ 

28. end orintSchar; 

29. 

30.orintSstring: orocedure(name> length); 

31. declare name address# 

32. (length#i#char based name) byte; 

53. do i = 0 to length - i; 

39. call print Schar(charCi)); 

35. end# 

36. end p r i n t Ss t r i ng; 

37. 



Figure 1. Sample PL/M program 
for computing square roots 
(continued on next page) 
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36,print?number; procedure(number/Dase/chars,zeroSsuppress); 
39. declare number address^ 

^0. (base/ charS/ zeroSsuporess/ i f j ) byte; 

^1. declare temp (ip) byte; 

'42. if chars > last(temp) then chars = last(temp); 

m. doi = ltochars; 

9^. j = number mod base + 'O'; 

9b. ifj>'9'thenj=j+7; 

96. if zeroSsuppress and 1 <> 1 and number = 0 then 

97. j = - •; 

98. t emp ( 1 eng t h ( t emp ) - i ) = j; 

99. number = number / base; 

50 . end; 

51. call p r i n t is t r i ng ( . t emp+ 1 engt h ( t emp) -c ha r s / chars); 

52. end o r i n t Snumbe r ; 

53. 

S '4. declare i address/ 

55. crlf literally 'cr/lf'/ 

56. heading data (crlf/lf/lf/ 

57. ' table of square roots'/ 

58. crl f / 1 f / 

59. ' value root value root value root value root'/ 

60 . ' value root ' / 

61. c r 1 f / I f ) ; 

62 . 

63. /* silence tty and print computed values */ 

69 . output ( t to) = 1 ; 

65. do i = 1 to 1 000; 

66. if i mod 5 = 1 then 

67. do; if i mod 250 = 1 then 

68. call pr i n t Ss t r i ng (. head i ng/ 1 engt h ( head i ng )) ; 

69 . else 

70. call printSstring(.(cr/lf)/2); 

71. end; 

72. call p r i n t Snumbe r ( i / 1 0 / 6 / t rue /* true suppresses 

73. leading zeroes */); 
79. call print it number(squareJ’root(i)/10/6/ true); 

75 . end ; 

7 6 . 

77. declare monitorSuses (10) byte; 

7 8 . e o f 



Figure 1 (continued). Sample PL/M program 
for computing square roots (after (29J ) 
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This program (as well as most other PL/M and C programs 
reproduced in this thesis) is written in lower case charac- 
ters/ since this is the normal input mode for the UNIX 
operating system/ which was used for all of the work 
described. In addition to the features previously men- 
tioned/ notice should be taken of the comment convention of 
tne language. Since comments can be placed anywhere within 
a program (rather than on separate lines as in FORTRAN)/ 
se 1 f “document at i on is encouraged. Although the ”/* */" con- 
vention is a little awkward/ it has the advantage of setting 
off comments and not discouraging short comments (as does 
the "COMMENT" convention in ALGOL). 

Figure 2 presents a second sample PL/M program which 
demonstrates another significant feature of the languaqe-- 
the nested macro~definition capability. While the macro- 
definition concept is certainly not new/ there are many 
languages which do not allow macros (most notably FORTRAN 
and ALGOL). Many languages which do have a macro capability 
do not allow nesting. As can be seen/ the macros increase 
the readability of the program/ but there is another/ 
perhaps greater/ advantage in using them. By using macros 
the programmer can specify certain items only once in a pro- 
gram (e.g./ vector sizes and input/output ports) and then 
use the macro names elsewhere in the program to refer to 
those items. Later he can modify his program by merely 
changing tne appropriate macro definitions. While the ad- 
vantages of being able to do this are not as evident in the 
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V* paper tape reader controller program */ 



3 . dec 1 a re 


forever 


literally 


'while 1 * , 


9. 


edata 


1 i t e r a 1 1 y 


' out put ( 1 ) ' , 


5. 


cstat 


literally 


' out put ( 2 ) ' , 


6. 


CCOm 


1 i t e r a 1 1 y 


' i nput ( 2 ) ' , 


7. 


CCDS 


1 i t e r a 1 1 y 


• input (3) • , 


8. 


rdata 


literally 


' i nput ( 1 ) ' , 


9. 


r s t a t 


literally 


* i nput ( 0 ) ' , 


1 0. 


rcom 


1 i t e r a 1 1 y 


'output (0)', 


1 1 . 


nor eq 


1 i t e r a 1 1 y 


'not ccom ' , 


12. 


aeon 


1 i t e r a 1 1 y 


'rorCrstat,!)', 


13. 


oe r r 


literally 


'10b', 


19. 


oadcps 


literally 


' 100b' , 


15. 


ok 


1 i t era 1 1 y 


' > 3 and cps < 26 ' 


16. 


r rdy 


1 i t e r a 1 1 y 


' rs t at ' , 


17. 


cl kl 


1 i t e r a 1 1 y 


' Id' , 


18. 


cl kO 


1 i t e r a 1 1 y 


' Ob' , 


19. 


drdy 


1 i t e r a 1 1 y 


* 1 D ' ; 


20. 








21 .declare 


cps byte 


r 




22. 


wa i t ( 22 ) 


byte ini 


t i a 1 


23. 


(250 


,200, 167, 


193,125,111,100,91, 


29. 


67, 


63,59,56, 


53,50,98,95,93,92,9 


25. 








2b. do forever; 






27. cst at = 0; 







2S. 

29, 

30, 

31, 

32, 

33, 
39, 

35, 

36, 

37, 

38, 

39, 
90, 
91 , 

92, 

93, 
99, 

95, 

96, 

97, 

98, 

99, 
50, 



wait until read request 
do while noreq? 
end; 



*/ 



/* determine the characters per second rate */ 
cps = ccDs; 



i f 



and rrdy then 



ok then /* 



we 

/* 



aeon 
do; 

i f cos 
do; 

edata = rdata; 
rc om = c 1 k 1 ; 
r c om = c 1 k 0 ; 
cstat = drdy; 
c s t a t = 0 ; 

/* wait for taoe to move 
call t i me ( wa i t ( c ps - 9)); 
end; else cstat = badeps; 
end; else cstat = perr; 



are ready */ 
to read characters 



*/ 



*/ 



end; 
eo f 



Figure 2. Another sample PL/M program 
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short program of Figure 2 as they would be in a large pro- 
gram/ it should oe apparent that this will increase the 
modifiability/ and consequently the reliability/ of programs 
written in the language. 

Another/ less significant/ factor which increases reada- 
bility is the inclusion of the separator in some of the 
long identifiers in the program of Figure 1. This character 
is ignored by the scanner in the PL/M compiler when included 
within identifiers and numbers. 

Examination of the PL/M manual [29] and the programs in 
Figures 1 and 2 will reveal that PL/M conta.ins functions 
which relate directly to the Intel 8008 and 8060 instruction 
sets. Thus PL/M fits the definition given by Lloyd and Van 
Dam for a *' tailored** language: **A language whose features 
are explicitly designed to coincide (to a large extent) with 
the hardware capabilities of its object machine ....** 
(38/ p.5^0] Fortunately this is not as serious a drawback as 
it might seem/ as evidenced by the fact that other micropro- 
cessor manu f ac t u re r s are now developing or have developed 
PL/M compilers for their machines. All of the functions in 
PL/M which relate specifically to the 6008 and the 8080 are 
implemented as built-in functions; i.e./ they are equivalent 
to procedures (rind variables/ in some cases) which are de- 
clared in an encompassing block level hidden from the pro- 
grammer. Lloyd and Van Dam 138] recoanized that this is an 
important concept/ and the method by which it is implemented 
is explained further in Chapter IV. 
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The built-in function approach is probably preferable in 
firmware applications to the extensible language approach/ 
although this will probably be a topic of considerable de- 
bate for many years. An extensible language is essentially 
one which has a more sophisticated macro capability than 
PL/M, It allows the programmer to define new instructions 
and redefine the base instructions of the language. This 
may seem to be a desirable feature/ but unfortunately it 
violates the principle of uniformity. Halstead observed 
that "... the ex t ens i b 1 e- 1 anguage approach ... seemed to 
open the door to a dangerous/ undisciplined proliferation of 
overlapping and even incompatible dialects within a single 
installation ,..." [23/ p,21^) 

B. POTENTIAL MODIFICATIONS 

In order for PL/M to serve as a useful general-purpose/ 
machine-in dependent programming language for firmware 
design/ it will probably be necessary to make some slight 
modifications. The changes described below were suggested 
by study of other programming languages which are similar in 
structure to PL/M/ with particular attention being paid to 
the C language (^9), This language has relative merits and 
shortcomings when compared with PL/M/ but it is a good sys- 
tem programming language which generates efficient machine 
code for the PDP-11 series of minicomputers. Most of the 
items listed below are convenience features rather .than 
necessities. (Of course/ a major advantage of high-level 
languages is their convenience when compared with assembly 
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languages.) Many of them were not implemented in the origi- 
nal versions of PL/M/ probably because they tend to lead to 
less efficient machine code^ but most of them would not be 
difficult to imolement and some might even allow more effi- 
cient code to be generated. The optimization techniques 
discussed in Section VII. 6 would be of benefit for those 
features with apparent inefficiencies. 

One major weakness of PL/M is its paucity of data types. 
If the language were to be used as the basis of a firmware 
design system/ it would need at least a concept of floating 
point variables in order to be widely accepted. Other 
desiraole data types include string and substring/ bit/ dou- 
ble precision/ and complex. It would also be convenient to 
have the capability to define data structures. Imolementa- 
tion of some of these various data types would probably sug- 
gest the need for a few new instructions for manipulating 
them efficiently. For example/ double precision arithmetic 
instructions and string concatenation instructions would be 
useful . 

For algorithms involving array arithmetic it would be 
desirable to have the capability to declare arrays of dimen- 
sion greater than one. A related feature is the ability to 
declare arrays with variable lower bounds/ as in ALGOL W. 

Recursion is another feature which PL/M lacks; however/ 
this may not be significant for firmware design applica- 
tions. Recursion allows compact expression of an algorithm 
but is not a necessary feature in a language/ since a recur- 
sive procedure may be rewritten as an iterative procedure. 
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recursion usually sacrifices execution efficiency 



for programming efficiency^ and great care must be taken in 
writing recursive procedures in order to ensure that they do 
not "blow up." 

A feature which would prove very useful/ especially in 
large system development/ is the ability to link indepen- 
dently compiled and tested segments of a program. The 
current Intel versions of the PL/M language do not allow 
this/ since the second oass of each compiler produces abso- 
lute machine code. The implementation of this feature would 
require the declaration of "glopal" or "external" variables/ 
the redesign of pass 2 to produce relocatable object code/ 
and the design of a linking loader. 

As will be discussed in Chapter VI/ the trends in digi- 
tal architecture development have encouraged/ among other 
features/ inclusion of multiple high-speed registers and 
fast increment/decrement instructions. One way in which to 
allow the high-level language programmer to take advantage 
of such features is to provide special constructs within the 
language. For example/ he could be allowed to declare fre- 
quently referenced variables'to be "register" variables in 
order to increase execution speed (and also produce a slight 
saving in the amount of main memory required). The program- 
mer could also be allowed to write statements such as 

i = ++j - k; 
or 

■ i = j — - k; 

in order to take advantage of the increment and decrement 






instructions. The first statement above would generate code 
to increment "jr" subtract the value of from the new 

value of ”jf" and store the result in "i"; while the second 
statement would generate code to subtract the value of "k" 
from the value of "j/” store the result in "if" and then 
dec r emen t " j . " 

Both the register declaration and' increment/decrement 
features are available in the C language. The 
i nc remen t /dec rement feature should be really just a conveni- 
ence for the programmerf since the same statements could be 
writteninCas 

i = (j = j + 1) - k; 

and 

i = j - k ; J = j - 1 ; 

and the compiler should generate the same code as for the 
previous two statements (unfortunately it doesn*t--see Sec- 
tion VII. B.l). The register declaration in C does result in 
more efficient code being generated; howeverr there may be 
other ways to solve the register allocation problem. This 
point is discussed further in Section VII. B. 2, 

Another potential change in PL/M would involve the addi- 
tion of the conditional expression. This would enable the 
St at emen t 

if a < b then c = a; else c = b; 
to be rewritten more concisely as 

c = if a < b then a else b> 

and could be done by merely adding a few more productions to 
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the grammar (see Section This change would not 
increase the efficiency of the generated code. 

One final area for potential modification involves the 
CASE statement and will probably require a great deal more 
study than some of the changes suggested above. The CASE 
construct in PL/M can be awkward and error-prone in some 
situations^ as illustrated in Figure 3(a). It should be 
noted that it would have been very easy for the programmer 
to incorrectly count the number of semicolons in this sec- 
tion of code. Also he has had to resort to the much- 
maligned GOTO in order to share code between two of the 
cases/ and his only control over an out-of-range value of 
"c" is to make a test before entering the CASE group. 

Figure 3(b) shows how the same routine would be imple- 
mented if the C language SifilTCH statement were available in 
PL/M. In this "case" there is no need to count semicolons. 
The use of the GOTO is avoided/ since the cases may be list- 
ed in any order/ and the BREAK is used to exit from the 
group. Also there is a specific default case to ensure that 
appropriate action is taken for all values of "c." 

Both the CASE and the SrtITCH constructs have advantages 
and disadvantages in comparison with one another. The CASE 
statement/ despite the drawbacks noted above/ will produce 
more efficient code in many situations and should not be 
discarded in favor of the Sr^ITCH. Ross/ et al/ highlighted 
one of the tradeoffs involved when they noted that 

... to ensure completeness of case statement control a 
programmer should be Permitted by the syntax to specify 
what should happen when a case statement variable is out 
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err; 



if c < 'a' or c > 'u' then call 



do case c - ' a ' ; 



toqd = true; 


/* 


case 


'd' 


*/ 


r r f r 9 r 9 


do ; 


/* 


case 


' 1 ' 


*/ 



1 = true; 
go to 1 ab 1 
end ; 



t ogo = true; 


/★ 


case 


'p' 




• • • 
/ r r 










togt = true; 


/★ 


case 


' t ' 




1 ab 1 ; do ; 


/ 


case 


'u' 




lim = 0 ; 










do while ( c 


♦ ^ 
• •• 


getc) 


O 

If 

A 


lim = 


lim * 


10 


t c • 


'O'; 



9' ; 



end; 

if 1 then liml = lim; 
else limu = lim; 
end ; 

end /* case group */; 



(a) 



do switch c; 

case 'd'; togd = true; break; 
case ' 1 ' : 1 = true; 

case ' u ' ; lim = 0; 

do while (c ;= getc) >= 'O' and c <= '9'; 
lim = lirr * 10 + c " 'O'; 
end ; 

if 1 then liml = lim; 
else limu = lim; break; 
case 'p': togp = true; break; 

case 't'; togt = true; break; 
default; call err; 
end; 



(b) 



Figure i. (a) PL/M code using the CASE construct/ 
(b) PL/M code using the SdlTCH construct 
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of ranqe. Confirmability applied to the same issue would 
imply a proqrammer snould be r egu i red to state what shoula 
nappen. Of courser if he knows that out of range values 
are not possible/ this too should be expressible/ to per- 
mit implementation efficiency. (bl/ p.25] 

Vaughn (581 has suggested that both facilities could be pro- 
vided in the same language in the form of a generalized IF 
statement. A simpler alternative would seem to be to incor- 
porate the structure of Figure 3(b) into the present PL/M 
language along with the CASE statement. 
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IV. PASS 1 IMPLE‘^ENTAT ION 



Aha and Oilman 13) have suggested that the process of 
compilation is composed of seven subprocesses: lexical 
analysiSf error analysis^ Dookkeeping# parsing^ translation 
(to an intermediate form)/ code optimization/ and object 
code generation. While it may be difficult to identify all 
seven of these sub processes in any given compiler and their 
order may not be the same as that given/ this is a good con- 
ceptual model. Figure 4 shows how each of the parts of this 
model is related to the others (3/ p.7^J. 

This chapter documents the initial stages of the design 
of a compiler for user-definable architectures. All but the 
code optimization and code generation phases of this model 
were implemented. The latter two phases are discussed in 
Chapters VII and VIII/ and suggestions are given there for 
their implementation. Recommendations for continuation of 
the design are given in Chapter IX. 

A. THE FORTRAN VERSION 

In order to gain insight into the analysis of some of 
the problems presented in other sections of this thesis/ it 
was felt that some practical experience in compiler imple- 
mentation was desirable. For reasons given in Chapter III 
the PL/M mi c roprocessor language was chosen for this pur- 
pose/ and an attempt was made to implement the commercial 
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Source 

Program 




Object 

Program 



Figure 4 , 



Model of a compiler (after 15]) 



ao 



f-URfHAM version on a Digital Equipment Corporation PDP-ll/SO 



computer with an interactive operating system. Unfortunate- 
ly this proved unfeasible# and attention was shifted to 
writing another version of pass 1 of the compiler in a sys- 
tem programming language. The latter effort was successful# 
and a full account is given below in Section IV. B. 

The main reason for the failure of the FORTRAN implemen- 
tation was that it required more primary memory than was 
available on the PDP-11. i/'ihen run on the IBM S/360 at the 
Naval Postgraduate School# pass 1 of the PL/M compiler re- 
quires approximately 120K bytes of memory. On the PDP-11 
only about 56K bytes of user memory were available# and if 
all of the object modules could have been linked and loaded 
together it is estimated that they would have occupied about 
lOO-llOK bytes of memory. 

An attempt was made to divide the routines in such a way 
that several sub-passes could be generated# each requiring 
less than S6K bytes# however# there turned out to be too 
much interdependence among the routines# and there was al- 
ways at least one partition which required more memory than 
was available. This was because the synthesis routine (the 
one which generates the intermeoiate language code) reouired 
aoout 50K oytes by itself# and it required many other 
routines to be loaded with it. 

Another problem which developed involved the discovery 
that the data initialization statements in the Intel PL/M 
compiler do not conform to ANSI standard FORTRAN 



specifications (although it is claimed that the compiler is 
written in standard FORTRAN to enhance t r anspor t ab i 1 i t y ) . 
The FUR.TRAN compiler used for this project accepts only 
programs written in standard FORTRAN. It requires each 
variable initialized in a DATA statement to be named indi~ 
vidually/ and this caused problems with the vast number of 
vectors which are initialized in the BLOCK DATA routine. 
This by itself was not a critical problem and could have 
been overcome without too much difficulty if there had been 
justification to continue working with the FORTRAN version. 

After the first attempts to partition pass 1 of the com- 
piler failed^ there were two alternatives available. Either 
a more concerted effort could have been made to subdivide 
the FORTRAN version/ or a completely new version could have 
been attempted in a more efficient language. After consid- 
ering the amount of effort which would be involved in work- 
ing with the FORIRAN version and the inherent inefficiencies 
entailed in running it on a 16-oit machine (e.g./ it assumes 
3^-bit integers) it was decided that it would be simpler and 
more beneficial in the long run to write another compiler. 

B. THE C VERSION 

After the FORTRAN version was abandoned/ pass 1 of the 
PL/M compiler was successfully implemented using the com- 
piler writing facilities supported by the UNIX operating 
system (501. Since a secondary objective of the project was 
to develop a system for experimenting with compiler design/ 
it proved worthwhile to utilize these more efficient 



facilities 



Because of the time constraints placed upon 



this project only pass 1 of the compiler was implemented; 
however/ much valuable experience was gained in the process/ 
and a great deal of this thesis has been influenced by the 
results obtained. 

1 . YACC 

For many years compiler writing was more of an art 
than a science/ but many important developments have taken 
place over the last decade to reverse this situation. Some 
of the most impressive of these developments have been in 
the area of formal language theory and automatic parser gen- 
eration. "The ability to generate parsers from a syntactic 
description of a language is an important consideration in 
reducing the cost of developing reliable translators." 
[60/ p.3ai 

The parser generator in the UNIX system is known as 
YACC (Yet Another Compiler-Compiler) 130). It has been in 
use for about two years at Bell Laboratories where/ among 
other things/ it has been utilized in the development of an 
easy-to-use language for a sophisticated mathematics 
typesetting system (32). Input for YACC consists of a syn- 
tactic and semantic description of the grammar of the 
language for which a parser is desired. Appropriate 
languages belong to the class known as LALR (2/3/0)/ or 
look-ahead LR/ since they read text from the left/ perform a 
right-parse/ and resolve conflicts by looking ahead in the 
text stream. This is a very oroad and useful subset of the 
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context-free languaqeSf and one which includes PL/M. YACC 
checks the grammar for conflicts and/ if none exist/ pro- 
duces a set of parse tables for the language. 

The semantics associated with each production in the 
grammar are transformed by YACC into a C program which con- 
tains the parse tables as data. When this program has been 
compiled it is linked with a parse table interpreter/ pro- 
vided by the YACC library/ and any other programs which have 
been written by the compiler designer. Actually YACC pro- 
vides only the core of the comp i lei the parser (which per- 

forms the syntactic analysis function of Figure and a 
means of communication between the parse stacks and the pro- 
grams provided to perform the other functions (lexical 
analysis/ error analysis/ bookkeeping/ and code translation) 
of the code generation process. 

F' i 1 e "m.gram" in Appendix B contains the YACC input 
for PL/M. The syntactic notation is somewhat different from 
the normally encountered BNF/ in which a production might be 
written as 

<NONTtRMl> ;:= <N0MTEKM2> TERMINAL <N0NTERM3> 
rather than the YACC version 

nonterml: nonterm2 'terminal' nonterm3 

in which all terminal symbols are quoted unless they have 
been declared to be terminals (as have "identifier”/ 
"number"/ and "string" on the first line of "m.gram"). The 
convention of using a vertical bar ("I") to indicate the 



beginning of a production with the same left side as the 



immediately preceeding production has been retained from 
8NF, The semicolon ( ” ; " ) is used to indicate the end of a 
set of productions with the same left side. It should be 
noted that a quoted semicolon may occur within a 
production as a terminal symbol. 

Semantics are provided by appending an equal sign 
("=”) followed by a C language statement (compound state- 
ments are enclosed in braces^ and ”1") to a production 
before either the vertical bar or the semicolon. The pro- 
cedures used for implementing the semantics of PL/M are dis- 
cussed in Section IV.B.5. 

The extreme flexibility afforded by the use of an 
automatic parser generator such as YACC is demonstrated in 
Figure 5» which shows the changes required in the PL/M gram- 
mar in order to implement the conditional expression con- 
struct (see Section III.B). Productions 86 and 87 are 
currently included in the compiler implemented for this oro- 
jectf and productions 87a-87c are the new productions which 
would have to be added. 



expression: 1 og i c a 1 exp res s i on /* 86 */ 

1 variable ' 1 ogi ca 1 express i on /* 87 */ 

1 ifexpression /* 67a */ 

• 

i f express i on : trueobject expression /* 87b */ 

i 

trueobject; ifclause expression 'else' /* 87c */ 

i 

V 

Figure b. Potential syntax changes for adding 
the conditional expression to PL/M (see Chapter III) 
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Uat a Structures 



One of the first and most important steps in design- 
ing a complex software system is the definition of an ap- 
propriate set of data structures. The principal structures 
used in the implementation of the PL/M compiler are 
described below in order to give a fuller understanding of 
the nature of the problem and insight into the changes which 
would be necessary in order to expand the use of the com- 
piler to a more general environment. The declarations of 
all of the stacks and tables used can be found in the file 
"m.decl" in Appendix B. The macros used in the declarations 
are defined in file "m.def." 

The two most important data structures used in a 
modern compiler are the Parse stack and the symbol table. 
The parse stack in an LALR parser is used to store input 
tokens for the "shift" and "reduce" operations. In general/ 
there are at least two parallel stacks which contain various 
pieces of information about the tokens. YACC provides parse 
stacks in its parse table interpreter routine/ with one 
stack being reserved for values provided by the scanner. 
The operation of these stacks is rather complex and will not 
be considered here/ since Aho and Johnson [21 have provided 
an excellent survey of the techniques involved. Since there 
was a need to retain more than a single piece of information 
about each token/ and there was no way to communicate with 
the parse stacks in the Parse table interpreter other than 
to provide a single value/ it was necessary to implement 
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four other stacks for this purpose 



The operation of these 



stacks is discussed in Section IV. B. 3. 

The symbol table is important for a number of rea- 
sonsf not the least of which is the fact that it is used in 
conjunction with the intermediate language output to 
transmit information to the later passes of the compiler. 
It usually accounts for the bulk of the main memory data 
storage requirements of the compiler and must therefore be 
implemented in as efficient a manner as possible. 

The symbol table is a vector of eight-bit bytes 
whichf during the course of a compilation/ consists of a 
series of entries of varying types. The format of a general 
symbol table entry for the PL/M compiler is shown in Figure 
6. This is the type of entry which is generated for all 
varianles and procedures declared by the programmer. 
Reserved words and macro definitions also are represented by 
symbol table entries/ but the formats of these entries are 
slightly different from that shown in Figure 6. The differ- 
ences are described below/ following the description of the 
general type. 

The first three bytes of the format are common to 
all three types of entries and are referred to as fixed 
information ("finfo" in the programs). The first byte con- 
tains the "last" field/ which specifies the number of bytes 
to the beginning of the preceding entry and is used for 
chaining downward through the table (as/ e.g./ when printing 
or dumping the symbol table). It should be noted that since 
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Figure 6 • 

Format of a general symbol table entry 



the "lest'' field contains only eight bits each symbol table 
entry is limited to 256 byteSf although it will generally be 
much snorter. This in turn ultimately limits the lengths of 
variable and procedure names and macro definitionsf since 
the characters for oescribing these attributes must fit into 
the remainder of the entry after the fixed information and 
other fields have utilized some of the 256 bytes. 

The second byte of the entry contains three fields; 
"type/'' "precision (prec)/" and "based (b)." The "type" 
field consists of four bits and is used to distinguish among 
the various types of entries (variable/ reserved word/ mac~ 
ro/ vector/ etc.). The alternatives can be found in file 
"m.def." The precision field contains three bits and is most 
commonly used to represent the precision (i.e./ the number 
of bytes required) of variables/ vectors/ or the result of a 
function procedure call (zero indicating no value returned). 
The "based" field/ if set to "1/" indicates a based/ or 
indirect/ variable. 

Next is the "size" field/ in the third byte. This 
field is used to indicate the length of the following two 
fielos/ "name" and "hcoll." The "name" field contains the 
printname of the symbol and has a length equal to the number 
of characters in the name. The "hcoll" field is always two 
bytes long and contains the absolute address of the previous 
symbol table entry whose printname has the same hash code as 
this symbol (see Section IV. B. 3). Thus the value of "size" 
is equal to the length of the printname plus two. 



During the course of this project it was found that 
compi 1 er~generat ed labels were the only entries which had no 
printnamer and the entries for these symbols conveyed no 
information other than the symbol number (see below). Since 
they only took up precious symbol table space (especially 
for long programSf which already require a great deal of 
space)/ these entries were eliminated from the symbol table. 
Examination of the symbol table in Figure 8 makes it evident 
which symbol numbers are used for compiler-generated labels/ 
since these are the only symbols which do not have entries 
(e.g./ S25/ Sa8/ S^9). 

Following the ”hcol1” field in the general symbol 
table entry is the **syno” field. This field contains the 
symbol number of the entry. Each time a new symbol is de- 
clared by the programmer an entry of this type is made/ and 
the next sequential symbol number is assigned. The "syno” 
fielci is ten bits long/ and thus there can be as many as 
10<£‘4 different symbols in any program (including compiler- 
generated labels). 

The final field in the general entry is the "length** 
field/ which indicates the number of elements in a vector or 
the number of arguments required by a procedure. In the 
latter case ** length** may be zero/ for a procedure with no 
arguments/ or as large as 63/ since a procedure definition 
uses only the first six bits of this field. (This restric- 
tion could easily be changed/ however/ it is doubtful wheth- 
er any procedure in a we11*written program would have more 
than 63 arguments.) 
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A saving of taole space is accomplished by classify- 
ing vectors into two categories# short and long# depending 
upon whether or not they contain fewer than 04 elements. In 
the case of short vectors (distinguished from long vectors 
by the "type" field) and variables# the byte containing the 
last eight bits of the "length” field is deleted# as dis- 
cussed in Section IV. B. 3. 

Figure 7 shows the changes required in the general 
format of Figure 6 for reserved words# macro definitions# 
and based variables. As indicated# all fields from "last" 
through "hcoll" remain as in the general format. Figure 
7(a) indicates that the entry for a reserved word (e.g.# 
"do#" "for#" "while") has one additional byte# the "resno" 
field# containing the reserved word number# which is impor- 
tant in the parsing process. Since this field contains only 
eight bits# there can be no more than 256 reserved words in 
the language. Following the "hcoll" field in the entry for 
a macro definition (Figure 7(b)) are the "msize" and "mdef" 
fields# the former giving the number of characters in the 
definition (restricted to a maximum of 255) and the latter 
containing the definition. For a based variable the "based" 
field contains a "1#" and there is a "bsyno" field inserted 
between the "syno" field and the "length" field# as shown in 
Figure 7(c). Ihe "bsyno" field contains the symbol number 
of the variable which serves as the base. The six unused 
bits in this type of entry are wasteful# but most programs 
do not contain many based var iaoles. 
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(b) 



(c) 



Figure 7. Format modifications for 
(a) reserved words» (b) macro definitions^ 
(c) based variables 
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Mow that the various fields in a symbol table entry 
have been explained/ it should prove useful to look at an 
example. Figure 8 shows the symbol table which was con- 
structed by the PL/M compiler for the square root program of 
Figure 1. Each line of the printed table corresponds to one 
entry in the symbol table. The reserved words/ which are 
stored itn mediately below symbol 80/ are not shown in this 
table. It should be noted that there are no “syno/" 
"oased/" "precision/" or "length" field entries for macro 
definitions. The "name" column of the table contains the 
printnames of all entries and the "msize" and "mdef" fields 
for mac ros . 

A very imoortant point to note here is that symbols 
S0-S22 were not declared in the square root program but are 
the variaoles and procedures which relate PL/M to the Intel 
6060. These symbols were placed into the symbol table dur- 
ing the initialization of the compiler and can be considereo 
to have been declared in an outer clock encompassing the 
square root program. The manner in which this was done can 
be seen by examining file "m.main.c" in Appendix B. Since 
it is very easy to change the names and attributes of these 
symbols in "m.main.c" it is also very easy to tailor the 
language to the architecture of the machine for which the 
object code is to be generateo (see Chapter III). The mean- 
ings of these symbols need not be of concern during the 
first pass of the Compilation but will be important during 
laterpasses. 
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Sy no 


B Pr 


Len 


Type 


86/ 


1 


10 


12 


S60 


1 


98 


1 1 
1 


SS8 


2 


1 


2 


S52 


1 


16 


12 


S51 


1 


1 


2 


S50 


1 


1 


2 


sas 


1 


1 


2 


89/ 


1 


1 


2 


S96 


1 


1 


2 


S98 


2 


1 


2 


899 


0 


9 


6 


891 


* 1 


1 


2 


890 


1 


1 


2 


838 


1 


1 


2 


83/ 


2 


1 


2 


836 


0 


2 


6 


835 


1 


1 


2 

1 


831 


1 


1 


2 


830 


0 


1 


6 


827 


2 


1 


2 


826 


2 


1 


2 


829 


2 


1 


2 


823 


1 


1 


6 



1 

1 

1 

1 

1 



Size i 'j a m e 

1 3 mon i t o ru se s 
9 heading 
6 crlf 5 CP/lf 
3 i 

6 temp 

3 j 
3 i 

1^4 zerosuppress 

7 chars 
6 base 

8 number 

13 print numbe r 
6 char Based S37 
3 i 

8 length 
6 name 

13 printstring 
3 i 

9 bi tcel I 2 91 

6 char 

11 printchar 
3 z 

3 y 
3 X 

12 squareroot 

7 false 1 0 

6 true 1 1 

^ 1 f 3 Oah 
cr 3 15q 
5 1 1 o 1 2 



Figure 8. PL/M symbol table 
for tne program of Figure 1 
(continued on next page) 
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Syno 

S22 

S21 

S20 

S19 

S18 

SI 1 

Sib 

Sib 

Sly 

S13 

S12 

51 1 
SIO 
S9 
S8 
S7 
S6 
S5 

sy 

S3 

52 
SI 
SO 



B Pr Len Type 
2 1 7 

2 2 8 

2 2 8 

1 1 8 

1 1 8 

1 1 8 

1 I 8 

1 1 8 

1 1 8 

1 1 8 

0 1 8 

1 2 8 

1 2 8 

1 2 8 

1 2 8 

1 2 8 

1 2 8 

2 1 7 

1 1 7 

1 1 7 

1 1 7 

1 1 7 

1 1 7 



Size Name 
0 

5 dec 

8 double 
b move 

6 last 

8 length 
8 output 

7 input 

5 low 
b high 

6 time 
b sc r 
5 sc 1 

5 sh r 
5 Shi 
5 ro r 

5 ro 1 

10 St ac kpt r 

8 memory 
8 parity 
b sign 

6 zero 

7 carry 



Figure 8. PL/M symbol table 
for the orogram of Figure 1 (continued) 
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3. The Parser 



The main function of pass 1 of the PL/M compiler is 
to convert the source language program into a form which can 
be used by remaining stages of the compiler to generate 
machine code. The source language program is reoresented in 
the computer as a linear string of ASCII characters^ orga- 
nized as /3 series of ” i dent i f i erSf " "numberSf” and ^strings” 
Call called "tokens”). This series of tokens is the "text 
stream" for the compiler. In order to perform the transla- 
tionr pass 1 must parse the program; i.e.f it must examine 
the text stream and determine which of the rules of the PL/M 
grammar can be applied in order to reduce the tokens to a 
"statement list" and finally to a "program" (see file 
"m.gram" in Appendix B). This section contains an overview 
of the parsing and symbol table functions of pass 1. 

bhen the parser provided by YACC requires a token 
from the text streamr it calls the user-prov i ded routine 
"yylex." (In this section "user" refers to the compiler 
designer rather than the firmware designer.) Ihis routine 
and the routines which it calls are listed in file 
"m.scan.c" (Appendix 3). "yylex" calls "gettoken," which 
constructs tokens from the input charactersf determines 
which of the three types of tokens (or a special 
c ha rac t e r--e . g . f commas semicolon) it has founds and com- 
putes a hash code for each identifier. The latter function 
is accomplished by forming the surn^ modulo 128f of the ASCII 
values of the characters in the printname of the identifier. 



This hash code is used by "yyle*" later for looking up the 
identifier in tne symbol table. 

.The vector “varc" (Figure 9) is used by "gettoken" 
to accumulate characters from the input string. Several 
tokens may oe accumulated in "varc" before oeing used by the 
parser/ and the variable "tokindex" is used to indicate the 
element of "varc" which is the beginning of the current 
"accumulator." The first byte of each accumulator contains 
the length of the token/ thus limiting the length of each 
token to no more than 25^1 characters. Since the length of 
"varc" is normally less than 2b5/ and it may contain more 
than one token/ the upper bound on the length of a token is 
usually much less than 25^. 

Once "gettoken" has completed its functions/ control 
returns to "yylexf" which may take one of several sets of 
actions/ depending on the type of token scanned. If an end 
of file character or other special character was scanned^ 
"yylex" returns the character to the parser. If a number 
was scanned/ "yylex" reports this to the parser and returns 
the value of the number. If either a string or an identi- 
fier was scanned/ "yylex" "pushes" information onto the 
user-controlled parsing stacks (Figure 9). (The stack 
manipulation routines are listed in file "m.aux.c.") In the 
case of a string/ the stack pointer ("sp") is incremented/ 
"var(spl" is assigned the current value of "tokindex/" and 
"tokindex" is advanced to the value of the next free loca- 
tion in "varc." The fact that a string was scanned is 



57 







I 




va rc 




Figure 9, Scanning and parsing stacks 
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reported to the parser along with the current value of "sp»" 
as discussed in Section IV. B. 5. 

Ihe actions for an identifier are somewhat more com- 
plicatedf since the identifier may be a reserved word/ macro 
call/ or proarammer-def i ned word. In order to determine 
which case is applicable/ the identifier is looked up in the 
symbol table by finaing its address in the element of the 
vector "hentry" given by the hash code computed by "qetto- 
ken." If the address is other than the zeroth element of the 
symbol taole/ the printname stored in "varc” is compared 
with the printname stored in the table entry. If the names 
do not match/ the value of "hcoll" is used as the next ad- 
dress in the search. This process continues until either a 
match is found or the current value of "hcoll" is the ad- 
dress of the zeroth element of the symbol table. If an 
entry for the identifier is located in the symbol table/ the 
"type" field is examined to determine whether it is a 
reserved word or a macro. In the former case/ the reserved 
word number is returned to the parser. In the latter case/ 
the scanner is set up to begin reading input characters from 
the "mdef" field of the symbol table entry/ and "gettoken" 
is called again. 

If the identifier is neither a reserved word nor a 
macro/ information about it is "pushed" onto the parsing 
stacks in the manner discussed above for strings. In addi- 
tion to the information stored in "var/" the address of the 
symbol table entry is stored in "symloc/" 



and the hashcode 



is stored in "hash" (see Figure 9). The "fixv" stack is 
used to hold other types of information during the parsing 
process. The fact that an identifier was scanned is report- 
ed to the parser along with the current value of "sp." 

If the symbol table search is unsuccessful/ an entry 
must be made using the routines in file "m.sym.c." The entry 
is made immediately following the most recent previous en- 
try. The "hcoll" field of the new entry is set to the value 
of "hentryChashcodeJ"/ and the value of "hent ry [hashcodel " 
is changed to the address of the new entry. It is assumed 
at this time that both parts of the "length" field will be 
required. If it is discovered (during the parsing of later 
text! that only the first six bits of the "length" field are 
needed/ the "compress" routine (file "m.sym.c") must be 
called to remove the extra byte in order to save space in 
the table. 

The parser itself uses tables generated from the 
grammar (file "m.gram") by YACC in order to perform the 
translation from the source language to the intermediate 
language (see Chapter V). It does this by shifting tokens 
onto a set of Parsing stacks hioden from the compiler 
designer. v^hen the tokens match one of the rules of the 
grammar/ a reduction is made by replacing the tokens on the 
stack with the symbol on the left side of the rule (or pro- 
duction). The methods used for detecting and recovering 
from errors in the input and the techniques for generating 
the intermediate language code are discussed in the next two 
sections. 
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y . t: r r o r Recovery 



In the discussion of the scanner in Section IV. b. 3 
it was assumed that the inout stream constituted a valid 
PL/M program. Unfortunately^ this is not always the case/ 
especially in the early stages of program development. In 
addition to the other tasks which a scanner must perform/ 
therefore/ it must be able to detect errors and report them 
to the programmer. One measure of a good compiler is its 
apility to accurately report all program errors. 

Debugging of a large program would be greatly inhib- 
ited if the compilation terminated after the detection of a 
single error. Thus it is desirable for the scanner to have 
error recovery mechanisms which enaole it to continue pro- 
cessing after detecting and reporting an error. The error 
handling and recovery techniques included in the YACC ver- 
sion of the PL/M compiler are discussed in this section.. 

There are three basic kinds of errors which may 
appear in a program--logic/ syntactic/ and semantic. Logic 
errors are errors in the programmer's thought processes 
which cause him to write statements which do something other 
than what he intended. For example/ he might write an ex- 
pression incorrectly or use the wrong indexing variable when 
working with a vector. It is impossible for a comoiler to 
detect errors of this type unless they also result in syn- 
tactic or semantic errors. 

Syntactic errors result from the violation of the 
grammatical rules of the language. The rules for a 
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proqramming language like PL/M are given in terms of a 
series of productions# as in file "m.gram" in Appendix B. 
One of the main advantages of using a parser derived from 
such a grammar is that it immediately detects and reports 
syntactic errors. 

Semantic errors are errors which do not violate the 
rules of the language but which do not have any meaning (or 
have an incorrect meaning) in the language. It is easy to 
write nonsense sentences in English which are grammatically 
correct. An example of a semantic error in a programming 
language is the use of a variable before it is declared. 
Some languages allow this# but in the current YACC version 
of the PL/M compiler this is not allowed# since proper sym- 
bol table entries are made only for declaration statements. 

At this point it should be helpful to look at an 
example. Figure 10 lists the sample PL/M program of Figure 
1 with several errors intentionally introduced. When this 
program was run through the compiler the output was as shown 
in Figure 11. It should oe noted that there are two basic 
types of errors identified in the output. Syntactic errors 
are identified by the term "syntax error#" while semantic 
errors are identified by the term "compile error." 

In order to allow the parser to continue scanning 
the input after a syntactic error is encountered# YACC al" 
lows an "error" production to be included in the grammar. 
Production 18 in file "m.gram" in Appendix B is the error 
production used for the PL/M compiler. In this p^roouction 
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1. 20^6: / * is the origin of this program */ 

2. declare tto literally *2', cr literally '15q'/ 

3. If literally 'Oah', 

true literally '1'/ false literally 'O'; 

b. 

b.squareroot: procedure(x byte; 

7. declare tx/y,z) address; 

8. y -xi z = shr(x+l,l); 

9. dowhileyoz; 

10. y=z;z=shr(x/y+y+l, 1); 

1 1 . end; 

12. return y 

13. end squareroot; 

19. 

iS.printSchar: proceoure(char); 

lo. declare oiticell literally '91'/ 

17. (char/ilbyte; 

18. output (tto) = o; 

19. call time (bit^cell); 

20. doi=oto7; 

21. outputttto) = char; /* data pulses */ 

22. char = ro r ( c ha r / 1 ) ; 

23. call t 1 me ( b i t ice 1 1 ; 

29 . end ; 

2b. output (tto) = l; 

26. call time tbitScell + bitScell); 

27. /* automatic return is generated */ 

28. end printSchar; 

29. 

30. printistring: procedure(name/ length); 

31. declare name address/ 

32. ( 1 engt h / i / char based name) byte; 

33. do i = 0 to lengtn - 1 

39. call printichar(charCi); 

3b . end; 

36, end printistring; 

37. 



Figure 10. PL/M square root 
program i t h errors 
(continued on next page) 
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38. prints number: procedure(nufnber/base»chars/zeroSsuporess); 

39, declare number address/ 

^<0, (base/CharS/zeroSsuppresS/i/j) byte; 

91. declare temp (16) byte; 

92. if chars > last(temp) then chars = last(temp); 

93. doi=ltochars; 

99. j = number mod oase + 'O'; 

95. ifj>'9'thenj=jf7; 

96. if zeroSsuPpress amd 1 <> 1 and number = 0 then 

97. j = ' •; 

98. t emp ( 1 enqt h ( t emp ) - i ) = j; 

99. numper = number / base; 

50. end; 

51. coll pr i n t 2) s t r i ng ( , t empt 1 engt h ( t emp ) -c ha r s / chars); 

52. end pr i nt ^number ; 

53. 

59. declare i address/ 

55. crlf literally 'cr/lf'/ 

56. heading data (crlf/lf/lf/ 

57. ' table of square roots'/ 

58. crlf/lf/ 

59. ' value root value root value root value root'/ 

60 . ' value root ' / 

61. crlf/lf); 

62 , 

63. /* silence tty and print computed values */ 

69. output(tto) = i; 

65. doi=ltol000; 

66. if i mod 5 = 1 then 

67. do; if i mod 250 = 1 then 

68. call p r i nt is t r i ng (, head i ng / 1 engt h ( head i nq )) ; 

69 . else 

70. call pr i n t s t r i ng ( . (c r / 1 f / 2 ) ; 

71. end; 

72. call p r i n t inumbe r ( i / 1 0 / 6 / t r ue /* true suppresses 

73. leadingzeroes*/); 

79, call printinumber(square2>root(i)/10/6/ true); 

75 , end; 

lb. 

77, declare monitoriuses (10) byte; 

76 . eo f 



Figure 10 (continued). PL/M 
square root program with errors 
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synt ax error/ 


line 6 / 


syntax error/ 


line 15/ 


comp i 1 e error/ 


line 20 


c o<np i 1 e error/ 


line 2o 


synt ax error/ 


line 23 / 


syntax error/ 


line 3^ / 


comp i 1 e error , 


line 35 


syntax error/ 


line 36 / 


compi 1 e error/ 


line ^6 


synt ax error/ 


line a6/ 


syntax error/ 


line 70/ 



on input: byte 
on input: end 
: variable undeclared 
: identifier cannot be a 

on i npu t : ! 

on i nput : call 
: identifier required 

on i nput : end 

: variable undeclared 
on input: identifier 

on i nput : ; 



variable 



Figure 11. Compiler output 
for program of Figure 10 
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error 



is a reserved terminal symbol name» and it causes a 
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State to be included in the parser which will be entered any 

time an invalid symbol is scanned. 

iNhen an error is seen, tbe currently active states are 
popped, one by one, until a state is reached which has a 
shift on error. This shift is then done, ana the reduc- 
tion performed. The user may specify an action, to do 
things such as position the input string and repair the 
symbol table. After this reduction is done, a flag is 
Set, and the parser remains in error state until three 
input symbols have been successfully shifted. If an error 
takes place when the parser is still in error state/ the 
input symbol is discarded and no new message is produced. 
130, p.l3] 

The reason for discarding input symbols if an error occurs 
while the parser is still in error state is to. prevent a 



simple syntactic error from causing an inordinate number of 
misleading messages to be generated. Of course, if there 
are any actual errors in the text while the parser is in the 
error state they will be ignored. For example, in Figure 11 
it can be seen that the parser discovered an error at the 



beginriing of line 3A when it encountered the symbol "call" 
without scanning a semicolon. Figure 10 shows that there is 



a missing parenthesis at the end of line 3A, and this was 



not detected by the parser, since it was still in error 



state. This is not a serious problem, since the parser 



would detect this error on the second compilation attempt, 
after the errors detected on the first try were corrected. 

The error production used in this compiler causes 



the parser to scan until 
ing to continue oarsing, 
although simple, error 



finding a semicolon before attempt- 
This was found to oe an effective, 
handlirig technique. The actions 
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which must be taken to allow parsing to continue without 
overflowing the various stacks and tables can be seen by 
examining the listings in Appendix B. Since PL/M is a 
statement oriented language rather than a card oriented 
language (such as FORTRAN) and statements are usually rela- 
tively short/ most errors will be detected by this scheme. 
In future work- it might prove beneficial to explore addi- 
tional schemes/ such as scanning to a comma or a close 
parenthesis. 

The actions required for detecting and reporting 
semantic errors are not as easy to specify as are those for 
syntactic errors/ since they are scattered throughout the 
grammar. For each production in the grammar the compiler 
designer must consider the meaning of any actions which are 
to be taken and what circumstances will will cause the ac- 
tions to be incorrect. For example/ the discussion in Sec- 
tion IV.B.c? points out that a procedure may have no more 
than 63 arguments. Thus the actions associated with the 
parsing of a "parameter list" (production ^2 in file 
"m.gram") must include a check for the number of arguments. 
Since it is very difficult to check all possible error con- 
ditions of this type/ semantic errors are much more diffi- 
cult to detect effectively than are syntactic errors. In 
Figure 11/ for example/ it can be seen that the error on 
line 20 ("o" in place of "0") causes a redundant error mes- 
sage to be generated. Improvement of the semantic error 
detection and reporting mechanisms in this compiler would be 
a worthwhile undertaking for future work. 
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b. Se^nantics and Code emitting 

The metnod for converting semantic actionsr provided 
by the. compiler designer^ into a C language program is dis- 
cussed briefly in Section IV. 8.1. In order for the action 
statements to communicate with the parser# a special nota- 
tion using "S" variables is employed by YACC. An example of 
this notation may be seen by examining the action statements 
associated with production 76 in file "m.gram" (Appendix B). 
Each symbol on the right-hand side of a production (to the 
right of the colonl corresponds to a pseudo- va r i ab 1 e # the 
name of which is composed of a dollar sign followed by a 

digit indicating the relative position of the symbol in the 

production. Thus "identifier" has the corresponding 

pseudo-variable "51" in prodiJction 76. There is always one 
and only one symbol on the left-hand side of a production# 
and it has the corresponding pseudo-variable "SS" associated 
with it. The notation is a convenience for the compiler 

designer# and all pseudo-variables are converted by YACC 
into actual C language variables before compilation. 

In Section IV. 6. 3 it is stated that the current 
value of "sp" is passed to the parser when an identifier 

(other than a reserved word or macro) is scanned. Any such 

information passed by "yylex" to the parser may be accessed 
in the action statements by referring to the appropriate 
variable. Thus# in production 76# the value of "$1" is the 
value of "sp" received from "yylex." This value is first 
passed as an argument to the procedure "symcheck" (file 
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”m«act.c**)r which checks to see if the variable has been 



f>reviously declared. Since production 76 is only applied 
during the parsing of declaration statementsr this consti- 
tutes a check for a semantic error--the redeclaration of a 
variable within the same block of the PL/M program. The 
next three statenfients are executed if this is the first time 
the variable is oeing declared in the current block. First/ 
the value of "fixvispJ" is set to zero to indicate that this 
is not a based variable. The next statement ("$S> = sytop”) 
is useo to communicate to the next production (possibly 72) 
the location at which the symbol table entry for. this vari- 
able oegins. The third statement calls the "enter" routine 
(file "m.sym.c") to actually make the entry in the symbol 
table. Whether or not an entry is made in the symbol table/ 
the final statement is executed to clear the information 
associated with this variable from the user-controlled 
stacks. The parser then makes the reduction indicated by 
production 76 and stores the information associated with 
"$$" in one of its parse stacks. 

An example of a production which causes intermediate 
language code to be emitted can be seen in the action state- 
ment associated with production 9b. In this statement/ "$2" 
refers to a value received from a previously applied produc- 
tion (one of the set 97-102). The "emit" routine (file 
"m.act.c") is called with two arguments/ the first giving 
the prefix ("OPH") and the second giving the operator deter- 
mined by the value of "S2" (see Appendix A). The "emit" 
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routine writes the information onto a disk file which may be 
used oy later stages of the compiler to generate machine 
code . 

It should be noted that the actions associated with 
production 79 also write information to a disk file using 
the "putw" routine provide by the C language library. This 
second output file is used to store all "initial” values 
declared in the PL/M program Deing compiled. 

One final point can be made by considering the ac- 
tion statements associated with production 33. Productions 
such as this are connected with the flow of control state- 
ments in PL/M, ana they cause compiler-generated labels to 
be produced in order to effect proper branching. Since 
these labels are often not generated in the same sequence in 
which they must appear in the intermediate language code, 
there has to be a mechanism for storing them until they are 
emitted. As in the case of production 33, labels are Gen- 
erated by incrementing the variable "nsym.” Code for a con- 
ditional branch is emitted at this point, but the label must 
be saved until the remainder of the code around which the 
branch occurs has been generated. This is done by calling 
the "spush” proceaure (file "m.act.c”), which pushes the 
label onto the "cstack." In order to save space in the com- 
piler, the "cstack" is actually not a separate stack but 
rather an area at the top of the space allocated to the sym- 
bol t a b 1 e . 

There are obviously many more details concerning the 
performance of this compiler than can be presented here. 
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The YACC reference document liOl should be consulted for a 



more complete discussion of the capabilities of YACC^ 
the pro.qram listings in Appendix B should be studied in 
der to determine how these capabilities were applied to 
PL/M compiler. 



and 
or- 
t he 
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V 



THE INTERMEDIATE LANGUAGE 



A. e UNCTION 

A very irtiportant concept in the design of a compiler for 
user-definable architectures is that of the intermediate 
language. The compiler model shown in Figure A assumes that 
the source program will be translated to an intermediate 
form/ although it is certainly possible to design a compiler 
which translates directly to machine code. In fact many 
compilers have been designed in the latter way/ but they 
lack transportability and are not aole to easily take advan- 
tage of the more advanced optimization techniques. 

The idea of using an intermediate language dates back at 
least as far as I9b8 when there was a discussion of the need 
for a universal computer-oriented language (UN COL) in some 
of the early issues of the Commun i cations o f the ACM 
ll2/bil/55). The intent was to have the UNCOL serve as an 
intermediary between high-level languages and machine 
languages. This would allow a compiler writer to concen- 
trate on translating from his high-level language to the 
UNCOL without worrying about the machine code considera- 
tions. It would also allow a system programmer for a given 
machine to write a generator program which produced the best 
machine code/ independent of the high-level language. When- 
ever a new machine was obtained at a computer installation 
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only the program which translated from UNCOL to machine code 
would have to be changed in order to continue using the 
high-level languages which had been used previously. Old 
programs could be recompiled without changing the source 
language. Similarly^ whenever a new language was designed 
it woula be necessary only to write a translator which could 
convert programs written in the new language to ONCOL pro- 
grams. The new language would then be available to users of 
any computer for which an UNCOL-mac h i ne code translator had 
been written. 

No such universal language has ever been developed/ but 
the concept of an intermediate language has been used by 
many software designers in writing compilers which could 
generate code for computers with different architectures and 
instruction sets (e.g./ the PL/M compilers available from 
Intel for the 8008 and 8080 microprocessors). The fact that 
an intermediate language is useful in compiling for user- 
definable architectures is verified by the use of such a 
mechanism by those who are trying to design high-level 
languages for microprogrammable machines 118/471. 

The main function of the intermediate language in the 
PL/M compiler is to serve as an information transmission 
medium between pass 1 and succeeding passes. In this role 
it is complemented by the symbol table (Section IV. B. 2) and 
the initial value file (Section IV. B. 3). The symbol table 
transmits the names and other attributes of the symbols used 
in the high-level program/ theinitial vaU<e file transmits 



the initial values of variables which are to be initialized/ 
and the intermediate language carries information about the 
actual program steps required by the algorithm. Other in- 
formation may be contained in the intermeaiate language 
code/ e.g./ the line number markers which are helpful in 
providing good diagnostics and information to aid the pro- 
grammer but are not really needed for the code generation 
process , 

Besides its use in transmitting information the inter- 
mediate language may have an important role in the process 
of debugging and simulating the actions of programs 
translated by the compiler. As explained in Section V.B 

belowf the intermediate language code for the PL/M compiler 

« 

can be considered to be the ** machine code” of a mythical 
stack machine. it would not be difficult to write an inter- 
preter which could read this code' and simulate the actions 
of the mythical machine in order to help the programmer 
debug his high-level language orogram. Broca and Merwin [9] 
have devoted a oaper to this topic/ and Reigel and Lawson 
(^8] have indicated that this could provide an important 
facility. 

6. THL POLISH Rt PRESENT A T 1 ON 

The intermediate language code for the PL/M compiler is 
based upon postfix Polish notation The reason 
for using this type of intermediate language is demonstrated 
in Figure 12/ which shows how an expression written in infix 
(algebraic) notation would be traf^ slated. The bottom - up 
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Figure 1<£. Example of Polish intermediate language code 
(a) infix exoressionf (b) equivalent postfix expression, 
(c) tree representation, (d) intermeaiate language code 
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parsing process for the infix expression in Figure 12(a) can 
be visualized in the tree structure of Figure 12(c)r which 
,is a two-dimensional representation of the postfix expres- 
sion in Figure 12(b). Thus the intermediate language code 
of Figure 12(d) results naturally from the parsing of the 
infix expression. Noteworthy is the one-to-one correspon- 
dence between symbols in Figure 12(b) and lines of code in 
Figure 12(d). 

This type of intermediate language representation is 
usually discussea in texts on compiler theoryr but the 
presentations usually do not provide many details about the 
methods used for an entire practical program. Often the 
discussion is limited to the type of information presented 
in Figure 12r but operators other than the simple arithmetic 
type are required in a language designed to represent real 
programs. For example^ operators are required for branch- 
ing, subscript calculation, and stack manipulation. The 
complete list of intermediate language prefixes and opera- 
tors used in the PL/M compiler, along with their meanings, 
is given in Appendix A. 

In order to provide an example of how the intermediate 
language is used to represent a program. Figure 13 presents 
part of the code generated by Pass 1 of the PL/M compiler 
for the square root program of Figure 1. The symbol table 
for this program can be found in Figure 8. Noticeable in 
this figure are the expansion factor and the loss of under- 
standability apparent in going from the high-level language 
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Figure li. Intermediate language code for the 
program of Figure I and symbol table of Figure 8 
(continued on following pages) 
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Figure 13, (continued) 
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Figure 13. (continued) 
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Figure 13, (continued) 
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to the 1 n t e rnr)ed i a t e language. Of course the computer, 
through tne actions of the compiler, is much better equipped 
to cope with this than the human programmer trying to write 
this program in assembly language. 

The postfix Polish code is often referred to as zero- 
address code since the operator instructions are intended to 
manipulate values on the top of a push-down stack ana thus 
do not contain an address field. The method generally used 
to generate machine code from this zero-address code is to 
simulate a mythical stack machine in the compiling process. 
This **me t a-execut 1 on** stack of course does not usually con- 
tain values# since most of the variables in a program have 
values assigned at execution time rather than at compile 
time# but rather it contains information about the proaram 
symbols. This type of code generator is fairly simple to 
implement# especially if optimization is not too important# 
and should be fairly easy to adapt to a table-driven scheme 
(see Chapter Vill). 

It should be noted that each ** i ns t rue t i on** in the inter- 
mediate language consists of two parts# a prefix and an 
operator or operand. The prefix indicates the type of the 
instruction# while the second part is an operator (e.g.# 
MIJL# ADD# IRA) for an OPR prefix or an ooerand for other 
prefixes. The LIN instruction is used to transmit line 
numbers from the source program# and the DEF instructions 
define labels in the intermediate code for purposes of 



branching 



The 



VAL 



and ADR instructions place values and 



addresses/ resoectively/ on the stack/ 



while the LIT in- 



struction places literal (numeric or immediate data) values 
on the stack. The second field of the LIT instruction is 
presented in four columns in Figure li. The first column 

gives tne decimal value of the 16-bit literal (range: 

/ 

0-oSbib)/ and tne second and third columns give the hexade- 
cimal values of the nigh and low order bytes/ respectively. 
Ihe fourth column indicates the two ASCII characters (if 
printable) represented by the value. 

c. other REPRLSENF ations 

The reverse Polish form is not the only one used for 
intermediate representation of programs. The two most com- 
monly used alternatives are triples and quadruples/ the 
former being equivalent to two-address code and the latter 
to three-address code. Using this terminology/ the Polish 
code could be referred to as "singles." 

Triples are more clearly representative of the tree 
structure of a program than zero-address code/ since each 
triple has the form 

(operator/ operandl/ operand?)/ 
and the triples are linked by pointers to show the flow of 
control (either operano may actually be a pointer to another 
triple whose result is used in the current triple). One 
difficulty with this method is that it requires more memory 
to represent the program than the Polish method/ out Cries 
I??) presents a method for modifying the implementation to 
reduce the memory requirement. 



i!.. 



a: 




Uuadruples have a 



“ r e s u 1 t 



field in addition to the 



three fields of the triple and take the form 

(operator, operandl/ operand<£/ result), 
where the fourth field is either a temoorary variable gen- 
erated by the compiler (in the case of a subexpression of an 
arithmetic expression) or a program variable (e.g./ the 
variable on the left-hand side of an assignment). Some qua- 
druples (and triples also) will require only the operator 

f 

and one operand (e.g., a branch instruction), while others 
will require all fields but "operand2" (e.g.r for the unary 
minus operator: y = -x ==> (-,x,,y)). The difficulties with 
quadruples are a greater memory requirement than for the 
Polish form and the large number of temporary variables 
which would be generated for any significant program. 

Many compiler designers prefer triples or quaduples to 
Polish code because these forms are claimed to 

be easier to manipulate for optimization purposes. This is 
a point which deserves further investigation; however, it 
should be noted that powerful optimization techniques have 
been successfully applied to programs represented by a Pol- 
ish intermediate language (131. 
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digital system description 



Once the source program has been translated into an 
intermediate form and its descriptive information has been 
preserved in tables^ the job of converting this information 
into control code begins. The remaining stages of the com- 
piler require/ in addition to the information transmitted 
from pass 1» detailed knowledge of the architecture and 
instruction set of the hardware in order to accomplish the 
code generation task. Ordinarily this other information 
would be included in the remaining stages at the time the 
compiler was designed/ but this cannot be done if the archi- 
tecture is unknown prior to compile time. Thus this chapter 
is concerned with the types of information required and the 
problem of describing this information for varying architec- 
tures. 

Many languages have been developed for describing digi- 
tal systems/ and two excellent surveys of these languages 
have been published [6/?51. Because most of these languages 
were developed oy individuals or small groups of individuals 
working on specific problems they have shortcomings which do 
not allow universal application. For this reason the 
Conference on Digital Hardware Languages/ a special continu- 
ing conference of experts in the computer hardware descrip- 
tion field/ has been formed in an attempt to define a 
language which can become a standard for the industry (37). 
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Although the main purpose of such languages is to serve as 
aids in designing and simulating digital systems/ it should 
be obvious that they could also be used to describe a system 
as part of the compilation process. 

Since there are several levels of detail which may be 
used to describe a digital system the first problem is to 
choose the most appropriate one. Bell and Newell [71 have 
defined a hierarchy of five levels for description of com- 
puter systems: the circuit level/ the switching circuit lev- 
el/ the register transfer (RT) level/ the programming level/ 
and the PMS (processor/ memory/ switch) level. .The circuit 
level is the lowest level and is well established/ with a 
notation and set of conventions which have become standard- 
ized over many years of electrical engineering practice. 
During the relatively few years that digital electronics has 
been in existence the switching circuit level has also be- 
come well established/ allowing designers to avoid much of 
the detail necessary in describing their systems at the cir- 
cuit level. Thus digital circuits are designed with gates 
and delays rather than transistors/ aiodes/ and other com- 
ponents of the circuit level. At the other end of the 
scale/ the PMS level (although the specific term was coined 
by Bell and Newell) has also been in use for some tijme/ 
since this is the level used to describe the gross proper- 
ties of computer systems. The programming level has also 
become well developed/ since most digital systems have been 
the kind which reguire a program in order to perform useful 
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functions. The KT level# which is the one that seems most 
natural for conveying the structure of digital systems and 
interfacing between the circuit levels and the programming 
level/ has been recognized as a level since the 1950*s but 
has only recently been the subject of serious efforts aimed 
at formalization (6], 

During the brief history of the computer industry com- 
puters have evolved from huge pieces of hardware with very 
limited capabilities (by today's standards) to very comoact 
units with very broad# powerful capabilities. For all of 
this change# though# t\^e architectures of computer systems 
still closely adhere to the concents originally prooosed by 
von Neumann 17# ch.^). Even the development of minicomout- 
ers and m i c r ocomput er s has not changed this fact# since most 
of the same features which were successful on larger comput- 
ers have been carried over into these smaller systems. In 
fact# the increased competition in the computer industry 
which has been caused by the acceptance of these new types 
of computers will probably have the effect of "weeding out" 
features which are not well conceived or introduced merely 
for uniqueness and of more or less s t anda r d i z i ng features 
which prove useful across a wide range of applications. 

No attempt will be made here to describe all of the 
variations in architecture which have evolved over the 
years# since a comprehensive survey has been presented by 
Bell and Newell (71. Suffice it to say that# while there 
are many differences among the various types of systems at 



86 



the switching circuit levelf there are many similarities at 
the RT level. The features which distinguish one system 
from another at this level can be grouped according to a few 
system characteristics. 

A. BASIC CHARACTERISTICS 

The two main PMS level building blocks in a conventional 

system are the central processing unit (CPU) and the memory. 

In most systems the majority of the instructions are devoted 

to transferring data between these two units. In order to 

represent these ideas at the RT level Barbacci [6] refers to 

the basic components as "operators" and "carriers." 

Operators are entities that produce information by 
transformation of bit patterns to which meaning has been 
assigned. These bit patterns reside in carriers > which 
are the entities used in storing and transmitting the 
information to and from the operators. [6^ p.139) 

The operators can be seen to represent the functions per- 
formed by the CPU in the classical digital system, and the 
carriers are hardware memory elements with various degrees 
of latency. Hires and busses can be considered short term 
memories, while registers and core memories carry informa- 
tion for longer intervals of time. 

The basic characteristics of a digital system can thus 
be described by defining the various carriers and operators 
of which it is constructed. Barbacci considers the register 
to be the basic unit in defining carriers, and its descrip- 
tors are its name, dimensions, and size of alphabet (typi- 
cally 2, as in binary systems). All types of memory in a 
system can be constructed from registers by forming 
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sub req i s t e rs » compound registersf and arrays. An example of 
subreqisters in a tyoical system is the oartition of the 
instruction register into an ooeration code field and one or 
more operand fields. The Intel 8080 microprocessor provides 
examples of compound registers? e.g.r the H and L registers 
are considered as separate registers in several instruc- 
tionsf but when concatenated they form the address register. 
The primary memory of a typical computer can be thought of 
as an array of registers. 

The two most primitive kinds of operators in a digital 
system are the ones usually represented in a high-level 
language: logical (negater inclusive orr exclusive orr and, 
equivalence) and arithmetic (addition, subtraction, mutipli- 
cation, division). Other operators needed in describing the 
system include vector operators for manipulating registers 
(shift, rotate), transfers, concatenation, and special 
operators (counting, exchanging, etc.). 

It is necessary but not sufficient for a compiler to 
have information about the operators and carriers of a com- 
puter system. Since the primary purpose of the compiler is 
to generate control code for the computer, information 
describing the instruction set is also required. In essence 
the compiler performs a mapping from intermediate language 
code to machine code, and it is necessary to provide suffi- 
ciently detailed information to carry out this mapping. The 
level of detail will be much greater if the intermediate 
language is to be translated into microcode than if it is to 
be translated into a conventional machine language. The bit 
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patterns of all of the machine instructions are required in 
order to oroduce the actual machine code^ and the mnemonics 
of the instructions (in an assembly language type of format) 
may be required in order to produce a version of the machine 
code which can easily be read by the programmers who will be 
trying to debug the programs produced. Special types of 
instructions (subroutine jumps and returns# interrupt pro- 
ducing and handling# i nou t /ou t pu t ) must be accounted for# 
and any side effects of instructions (such as the setting of 
condition codes) must be described. Methods of addressing 
will have an effect on the code produced and will also have 
to be described. For some machines it is necessary to 
transfer operands from main memory to registers in order to 
operate on them# while other machines have instructions 
which operate on operands directly in main memory. Many 
machines have both types of instructions. Indirect address- 
ing# indexing# and stack manipulation are important features 
which also must be described. 

The amount of detail provided about the instruction set 
and the physical characteristics of the machine has a direct 
bearing on the capability of the compiler to produce "good" 
machine code. If sufficient information is available 
machine-dependent optimizations can be performed on the code 
as it is being generated# as discussed in Section VII. A. 1. 

B. MICROP^^OGRAMMING AND MODULARITY 

Two basic classes of systems can be distinguished# the 
first being the conventional or classical type# 
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characterized by a fixed or hardwired instruction set. 
Mi c roproqramrnab J e systems^ in which the instruction set (in 
the usual sense of the term) is able to be changed by alter- 
ing a memory/ form the second class. Included in the latter 
class are modular systems# which in addition to having a 
variaole instruction set have a variable machine organiza- 
tion. Both classes of systems require the basic types of 
information discussed above to be transmitted to the com- 
piler. The additional types of information required by the 
compiler for m i c roprogr ammab I e systems are discussed below. 

Mi c roprogrammab 1 e systems have been growing rapidly in 
use in the last few years# but for all of the special atten- 
tion which has been devoted to it# microprogramming is not 
significantly different from "regular” programming. Reigel 
and Lawson have defined m i c rogramm i ng as "... a technique 
for implementing the control function of a digital computing 
system as sequences of control signals that are organized on 
a word basis and stored in a memory.” [^8# p.21 There is 
nothing in this definition which does not apply equally well 
to a non-m i c roorogr ammed computer. Eckhouse has noted that# 
with respect to m i c rop r ogr ammab I e hardware# "... all of the 
machines can be classified as classical# von Neumann in 
nature with only minor perturbations.” [18# p.l72] 

What m i c roorogr amm i ng has done is allow increased flexi- 
bility in digital system design by providing the designer 
with greater access to the hardware. IBM was the first com- 
pany to successfully apply m i c rog r amm i ng when it produced 
the S/360 family of computers. The designers were able ”... 
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to achieve a range of conipatible processors offering the 
same large machine instruction set at many different levels 
of perf ormance. ** [6# p.31 It is interesting to note thatr in 
a sense# the common machine code of this series could be 
considered a mac h i ne- i ndependen t programming language. 

Perhaps the attraction of microprogramming can best be 
appreciated by considering the distinction which Barbacci 
makes between architecture and machine organization. He 
considers the architecture to be the behavioral description 
of a system# i.e.# what the programmer perceives the system 
to be. On the other hand# the machine organization is ”... 
the oarticular combination of registers# busses# combina- 
tional networks# and control ...” in a system. t6# p.l^^] 

The architecture influences the machine organization by 
imposing a set of requirements (a particular instruction 
set) and the o roan i z a t i on # mainly for technological rea- 
sons# influences the architecture of the machine. The 
result is usually that a given computer architecture can 
be implemented on a set of machine organizations# and a 
given organization accepts several architectures. 
[6# p.l^aj 

Thus in a conventional computer system it is necessary to 
describe only the architecture of the machine. The instruc- 
tion set serves as a level of abstraction separating the 
programmer (or compiler) from the machine organization. In 
a mi croproqrammed system another instruction set may be 
defined in order to preserve this abstraction# but if the 
programmer is to work in a high-level language it may be 
possible (but not necessarily desirable) to skip this step 
by allowing the compiler to interface directly with the 
machine organization. 
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The major hurdle in microprogramming is the introduction 
of timing considerations to the list of information re- 
quired. In describing the basic information required by a 
compilerf no mention was made in Section VI. A of the timing 
of instructions. The reason for this is that conventional 
systems usually operate in a sequential fashion^ with the 
execution of one instruction not beginning until the previ- 
ous instruction has been executed. Even though many events 
may be occurring in parallelf this is hidaen by the instruc- 
tion set. One of the values of microprogramming is that it 
allows the programmer to specify the concurrency of certain 
events. Unfortunately this additional flexibility is ob- 
tained by increasing the amount of detail with which the 
programmer must cope. This is why instruction sets similar 
to tnose of conventional computers are usually defined for 
microprogrammable systems. For example^ the Intel 3000 
series of microprocessor chips has been advertised as a 
microprogrammable microprocessor; however, the series has 
not yet been exploited to its fullest potential because 
Intel is still in the process of defining a higher-level 
instruction set comparable to the typical machine language. 
More of the considerations involved in working with con- 
current systems are presented below in Section VI. C. 

Another trend which is occurring in the digital elec- 
tronics field is the devel opmen t of modular systems 
(16,31,56). The reasons for development of such systems are 
very similar to the reasons for some of the software 
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engineering concepts (see Section I.B). In factf modularity 
(of programs) is one of the principles of software engineer- 
ing. In addition to the variable architecture characteris- 
tic of other m i c r oo rogr arnmab 1 e systemsf modular systems have 
variable machine organizations. Thus the task of program- 
ming these systems is even more difficult. 

If development of different types of components can be 
reducedf and if st anda rd i zat i on of modules, test pro- 
cedures, and logistic support can be achieved, the life- 
cycle cost of systems can be greatly reduced. One 
approach to implementation of this idea is to identify a 
level of modularity for components which can have wide 
application in many types of systems. This allows the 
development cost of the modules to be spread over many 
units, while reducing the guantity of components and the 
logistics costs. (56, p.3) 
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Figure 1^. Example of a three-bus 
modular system using QED modules 
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One of many possible configurations for a modular system 
is shown in Figure I56r p,19]. The control module in 
this type of system would probaoly consist of a read--only 
memory programmed to initiate all actions required by the 
system performance specifications. This type of system 
would be very similar to a m i c rop r og r ammaP 1 e computer in its 
operation and control code structure. Thus the main problem 
in programming such a system will be to take maximum advan- 
tage of parallelism. The designer will also have the prob- 
lem of selecting the most appropriate types and numbers of 
modules to use and the problem of choosing the most effi- 
cient o r gan i za t i on. (i.e.f the number of busses and the con- 
nections of modules to busses). 

An alternative type of modular system would spread the 
control function among all of the modules rather than con- 
solidating it into a single module. Such a system would not 
be program controlled but would accomplish its t^asks by hav- 
ing the various modules ■ communicate with one another by 
means of ”reaoy'* and ** acknowledge” signals. Because such 
modules would not be as flexible as the type shown in Figure 

they probably will not be as widely used. 

C. PARALLELISM 

The raoid growth of the computer industry has been 
spurred by the e ve r- i nc reas i ng speed of computer hardware 
brought about by continuing advances in the electronic com- 
Donent industries. Vacuum tubes gave way to transistors in 
the late l^SO’s and early 1960’sr and the latter were 
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supplanted by integrated circuits in the mid-1960's. These 
small-scale integrated (SSI) circuits were soon antiquated 
by med i.um- sc a 1 e integration (MSI) technology# and direct 
gate-level and register-level design (without considering 
the c i rc u i t - 1 e ve 1 ) became possible. Today large-scale in- 
tegrated (LSI) circuitry is available in large quantities# 
and whole subsystems can be manufactured on one small chip 
of silicon. While these tremendous increases in circuit 
density have played an important part in increasing the 
speed of digital systems# they have been accompanied by 
advances in the state-of-the-art in semiconductor manufac- 
ture which have allowed much faster switching times to be 
achieved. 

Unfortunately# there are physical limits to the 
processes which have brought about these vast changes# and 
the semiconductor industry will soon be nearing them. As a 
result# increases in computer circuit speed will be coming 
at a much slower rate than in the past (unless some new 
technology is discovered which does not depend on the motion 
and storage of electrons). Thus any further major advances 
in computer speed will have to rely on increased use of 
advanced machine organization techniques which take advan- 
tage of features such as parallelism and pipelining. Paral- 
lelism refers to the concurrent performance of multiple 
tasks in a system# where a task is "... a self-contained 
portion of a computation (or some other computer operation) 
that once initiated can be carried out to its completion 
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without need for additional inputs." I'lbf p.98b] Pipelininq 
is accomplished by dividing a task into many independent 
subtasks such that a new process can begin the task as soon 
as the previous one has completed the first subtask. In 
this way many processes can be performing the same task, 
each being in a different stage of completion. 

Another factor which has resulted in the increased use 
of parallelism in digital systems is the growing emphasis on 
modularity ana microprogramming discussed in Section VI. B. 
I'lodularity in hardware may be looked at from several 
viewpoints, from small modules such as regi st ers . and busses 
considered in microprogramming to large functional modules 
as discussed by Tinklepaugh and Eddington [56J . This wide 
diversity means that there are several levels of parallelism 
which must be considered/ each with its own unique problems 
and methods of solution (A6] . 

The fact that parallelism is a significant consideration 
in the design of a digital system is highlighted by the fact 
that at least one entire book has been devoted to the sub- 
ject (39). Several high-level language compilers have been 
developea or proposed for use with the new generation of 
array processors, which rely heavily on the use of parallel- 
ism at the instruction and arithmetic expression levels 
(35) . Ramamoorthy, Park, and Lee 1961 present a good over- 
view of some of the factors involved in working with paral- 
lelism and present several algorithms for taking advantage 
of it at the arithmetic expression and subexoress i on levels. 
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There are two oasic ways in which parallelism may be 
handled in a high-level language""exolicitly or implicitly. 
In the explicit approach there are special instructions 
included in the language (e.g./ FORK and JOIN) by which the 
programmer may indicate sections of code which may be exe- 
cuted in parallel. 

The exolicit approach is advantageous for the recognition 
and representation of parallelism between blocks of 
instructions or between instruct ionsf since the analysis 
of parallelism between tasks at these levels is simole. 
However/ the explicit approach is not advantageous for 
recognizing and representing parallelism at arithmetic 
operation (subexpression) or micro-step level/ because it 
is tedious and mistake prone. [^6/ o.986] 

The implicit approach places the burden on the compiler by 
incorporating two new steps in the compilation process. The 
first step involves the recognition of oarallel processable 
tasks/ and the second involves representation of the infor- 
mation obtained in the first step and allocation of 
resources in such a manner that maximum advantage is taken 
of the parallelism. "This approach involves considerable 
overhead to recognize parallel tasks in a program although 
it relieves programmers of additional duties." [96 p.986) 

Thus/ as is usually the case in design work/ there are 
tradeoffs involved in determining whether to use the expli- 
cit or the implicit approach t o. pa r a 1 1 e 1 i sm . At the current 
state-of-the-art in compiler design it is probably desirable 
to use some combination of the two. The explicit approach 
can be used for blocks of instructions (e.g./ subroutines)/ 
and tne implicit aoproach can be used at the arithmetic 
expression level to reduce the number of programming errors 
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which would result from using the explicit approach at this 
level • 

In either approach the compiler can be given the job of 
allocating the resources for execution of the program. The 
major problem in this area involves the proper synchroniza- 
t i on of the various pieces of hardware. Again there appear 
to be two possible methods for approaching this problem. 
One way would require that the description of each module 
contain information about the maximum time required for the 
module to perform a given function. Then once the compiler 
had assigned a task to a particular module it would have to 
allow the specified amount of time before it could issue an 
instruction requiring the results of that module to be 
available. This approach has the disadvantage that it would 
not allow the hardware to operate at maximum speedf since 
many functional modules (e.g.f multipliers) have a wide 
variation in speed depending on the input data. The other 
approach is the one used in modern operating systems for 
sophisticated computers. In essence this would involve set- 
ting up a small operating system which would perform the 
resource allocation at execution time rather than compile 
time. Synch ron i zat i on would be accomplished by sending con- 
trol signals to the various modules and receiving signals 
from the modules to indicate task completion. Obviously 
this method involves a fairly significant amount of overhead 
in the form of additional memory required to hold the 
operating system. Thus the programmer would have another 
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tradeoff to make in determining which method would be most 
appropriate for his application. 

As in the other sections of this chapter# many ideas 
have been presented in this section. No specific recommen- 
dations have been made as to which of them may be applicable 
to a compiler for user-definable architectures# since the 
determination of such recommendations will require a consid- 
erable amount of additional research and experimentation. 
The intent here has been to exhioit a (not necessarily all- 
inclusive) list of some of the things which must be con- 
sidered in implementing ”pass 2** of such a compiler* 
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Vli. COMPILER OPTIMIZATION 



Optimization is a frequently pursued goal in the design 
of engineering systems/ whether they be hardware systems or 
software systems. The mathematical solution of an optimiza- 
tion problem requires finding the minimum of a cost function 
(or maximum of a reward function) while satisfying a set of 
constraints. Unfortunately the equations involved are often 
nonlinear, making a closed-form solution impossible. 
Attempts at solution by enumerating all the possibilities 
are usually not practical for nontrivial problems, because 
the enumeration expands in a combinatorial manner. (In 
optimal control theory this problem has sometimes been re- 
ferred to as /the "curse of dimensionality.") Thus, though a 
large body of theory has been developed to deal with optimi- 
zation problems, often the only practical solution to a 
problem involves the use of ad hoc methods. Such has been 
the case to a large extent in dealing with the problem of 
code optimization. 

Another significant barrier to the application of good 
optimization techniques is the general difficulty of speci- 
fying what constitutes an optimal solution to a given design 
problem. In fact it has been noted by Aho and Ullman, with 
respect to the code generated by a compiler, 

... that there is no algorithmic way to find the shortest 
or f as t es t - r unn i ng program equivalent to a given program. 
... Thus the term optimization is a complete misnomer--in 
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oractice we must be content with code improvement# Vari- 
ous code improvement techniques can be employed at various 
phases of the compilation process# [3/ p#70-711 

It is the purpose of this chapter to discuss the motivation 

for research in the area of compiler optimization and to 

examine some of the formal techniques which may prove useful 

in implementing compilers for firmware design languages# 

A. MOTIVATION 

As far bac^ as the early 1950*Sf when the FORTRAN I com- 
piler was being designedf it was recognized that convenience 
alone was not enough to persuade programmers to use high- 
level languages 152]# Unless the compiler could produce 
machine code which was comparable in efficiency to hand- 
coded programs there would be a great deal of resistance to 
the use of high-level languages. In the intervening years 
computer architectures and instruction sets have increased 
in complexityr making it even more difficult for a compiler 
to match a good assembly language programmer# 

Three computer hardware trends which have developed over 
the years are the increase in speeds the increase in main 
memory sizer and the increase in size and power of the in- 
struction set# These trends have had the effect of reducing 
the need for optimization in compilersr since for many ap- 
plications the hardware efficiency more than offset the 
comp i 1 e r-gene r a t ed code inefficiency# In recent years there 
has been yet another trend--the acceptance of m i n i compu t e r s r 
and now microcomputers# as components in the design of 
larger systems# In such applications the cycle is beginning 
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to repeatf since these smaller computers typically have slow 
execution times# a small amount of memory# and a relatively 
1 i m i t ed .numbe r of instructions. Thus the programs written 
for these devices must be as efficient as possible in order 
to minimize the amount of hardware used. Even in sophisti- 
cated m i c rop rog r ammab 1 e systems# though# the amount of 
hardware used may be critical in determining the profitabil- 
ity of a given design. As a consequence# code optimization 
is becoming increasingly important in order to allow 
firmware designers to take advantage of all the benefits of 
high-level language programming. 

An example of the kinds of inefficiencies involved is 
shown in Figures 15-17. Figure 15 shows a PL/M program for 
performing a simple bubble sort# while Figure 16 shows a 
hand-coded Intel 8080 assembly language version of the same 
program M3J . Note that neither of these programs would 
actually be run oy itself but would probably be a procedure 
in a larger program (in which ARRAY and N would be given 
values). The purpose here is to examine the code generated 
for the sorting algorithm without getting involved in the 
various issues of subroutine linkage. 

Figure 17 shows the output (reformatted by the author 
for ease of comparison) of the Intel 8080 PL/M compilerf 
version 1.0. Not counting storage space for the variables# 
the hand-coded version requires ^0 bytes of storage# and the 
compiler version requires Il6 bytes--a relative inefficiency 
of 190 percent. A similar but somewhat larger version of 
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DECLARE ARRAY(256) BYTE, 

(N,I,T1, r2,St^ITCHE0) BYTE; 

SWITCHED = i; 

DO WHILE SWITCHED; 

SWITCHED = O; 

DO I = 1 TO N - i; 

T1 = ARRAY(I); T2 = ARRAYd + 1); 
IF T1 > T2 THEN 

do; 

ARRAY(I+1) = Tl; 

ARRAYd) = T2; 

SWITCHED = l; 

END; 

END; 

END; 

EOF 



Figure 15. PL/M bubble sort program 



1 . 




MVI 


Or 1 


2. 


LI ; 


MOV 


A,0 


3. 




ADI 


0 


a. 




JZ 


L2 


5. 




MVI 


D,0 


6. 




MVI 


H,N.H 


7. 




MVI 


L, N.L 


8. 




MOV 


B,M 


9. 




MVI 


H, ARRAY. H 


10. 




MVI 


L, ARRAY. L 


1 1 . 


L3: 


DCR 


B 


12. 




JZ 


LI 


13. 




MOV 


A,M 


19. 




INR 


L 


15. 




CMP 


M 


16. 




JP 


L3 


17. 




MOV 


C,M 


18. 




MOV 


M, A 


19. 




DCR 


L 


20. 




MOV 


M,C 


21. 




INR 


L 


22. 




MVI 


D, 1 


23. 




JMP 


L3 


29. 


L2: 


HU 





Figure 16. Hand-coded 8080 assembly language 
version of bubble sort program (after Pooper (A3I ) 
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LO; 


El 




30. 




MOV 


M/ A 


66 , 




HLT 





Figure 17, Reformatted PL/M comoiler output 
for bubble sort program of Figure 15 
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this program, cited by Falk [19), was hand“Codpd for the 
Intel 8008, and required 397 bytes of code. The initial 
version of the 8008 PL/M comoiler generated 995 bytes of 
code for this larger program--a relative inefficiency of 97 
percent. A later version of the 8008 PL/M compiler (con- 
taining improved optimization techniques) generated 388 
bytes of code, yielding a 12 percent relative inefficiency 
[39] . 

This tends to confirm the fact that, in most applica- 
tions, comoi 1 er-oroduced code compares more favorably with 
assembly code as the size of the program increases. Also 
the current PL/M compilers use relatively unsophisticated 
optimization techniques, and further improvements could be 
obtained with relatively little additional effort. 

Comparison of Figures 16 and 17 shows that the bubble 
sort program brings out two of the most severe problems in 
comoiler code gene rat i on-- 1 he register allocation problem 
and the subscript calculation problem. Less significant, 
but also evident, are the differences in the methods of 
branching for the loops and the four extra bytes of code 
generated by the comoiler for all orograms (to set the stack 
pointer at the beginning and enable interrupts at the end). 

The assembly language version takes advantage of the 
fact that there are enough index registers available on the 
8080 to hold all temporary variables needed in the sort 
routine. This saves at least three bytes of code (to load 
the address into the HL register) each time one of these 
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variables is referenced (unless the address is already in HL 
from a previous reference). 

The greatest saving in the program of Figure 16 results 
from the use of the HL register as a pointer into the array 
being sorted. The programmer realized that the elements of 
the array being referenced at any time were always adjacent 
to one another/ and he stepped through the comparisons and 
swaps by appropriately incrementing and decrementing the 
address register. The current compiler is not capable of 
making this optimization and so recomputes subscripts for 
each variable reference. 

An attempt was made to rewrite the PL/M program to more 
closely match the structure of the assemb 1 y 1 anguage version 
(see Figure 18). In the new program the iterative loop was 
replaced with a WHILE loop/ and the swapping process was 
modified so that it would use only one temporary variable. 
Unfortunately the savings produced by these changes were 
offset by the computation of one additional subscript/ and 
the new program generated as much code as the old. 

As mentioned in Chapter II/ one of the advantages of 
programming in a high-level language is the ease with which 
changes can be made. Figure 19 shows the minor PL/M program 
changes (to the declaration statements) which would be re- 
quired if the array to be sorted contained more than i?56 
values and if the values were double-byte (address) rather 
than single-byte. Two other changes have been indicated in 
order to make the program technically correct. Of course 
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1 . 


DECLARE ARRAYC256) BYTE, 


2. 


(SWITCHED, N, I , TEMP) BYTE; 


3. 


SWITCHED = l; 


9. 


DO while switched; 


s. 


SWITCHED = O; 


t> . 


I = n; 


7. 


DO while (I := I - 1); 


6. 


IF ARRAY(I) > ARRAY(I+1) THEN 


9. 


DO; 


10. 


TEMP = ARRAY(I); 


11 . 


ARRAY(I) = ARRAY(I+1); 


12. 


ARRAY(I+1) = temp; 


13. 


SWITCHED = l; 


la. 


end; 


15. 


end; 


16. 


end; 


17. 


EOF 


Figure IB. PL/M program revised to match 


the control 


structures of the assembly language version 


1 . 


DECLARE ARRAY(256) ADDRESS, 


2. 


(N, I, TEMP) ADDRESS, 


3. 


SWITCHED BYTE; 


a. 


SWITCHED = l; 


5. 


DO v'thile switched; 


6 . 


Switched = o; 


7. 


I = N - i; 


8. 


DO -WHILE (I := I - 1) + i; 


9. 


IF ARRAY(I) > ARRAY(I+1) THEN 


10. 


DO; 


1 1 . 


TEMP = ARRAY(I); 


12. 


ARRAY(I) = ARRAY(I+1); 


13. 


ARRAY(I+1) = temp; 


la. 


SWITCHED = l; 


15. 


END; 


16. 


end; 


17. 


end; 


18. 


EOF 


/ 


Figure 19. Modified PL/M program 
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tne changes caused much more code to be generated (197 bytes 
as opposed to 116) in order to handle the double-oyte arith- 
metic and data transfer operations. The interesting point 
here is that an assembly language version# if one had been 
written for this new oroblem# would certainly be much longer 
than ^0 bytes since there would be insufficient registers 
available to hold all of the temporary values. Also the 
compiled version would have a much lower relative ineffi- 
ciency in relation to such a hand-coded version of the new 
program. The amount of effort required to change the pro- 
gram would obviously have been many times greater than was 
the case for the PL/M version. 

B. TtCHNIQUES 

In his excellent compiler optimization survey Schneck 
1521 has classified optimization techniques into three func- 
tional categories based upon the amount of knowledge they 
require about the object machine. He calls the three ca- 
tegories mac h i ne-dependent f architecture-dependent# and 
a rc h i t ec t ur e- i ndependent . Some of the more important tech- 
niques in each category are highlighted below in order to 
show that many of the inefficiencies usually associated with 
compiler-generated code can be eliminated if careful atten- 
tion is paid to optimization. 

1 . Machine - Oenendent 

Machine-dependent optimizations are also classified 
as local optimizations since they are applied to short spans 
of code during the code generation process rather than prior 
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to code generation as indicated in Figure Thus these 
techniques require a detailed knowledge of the instruction 
set of the object machine. For example# if the operation to 
be performed is an addition and one of the operands is known 
at compile time to have a value of one# the code generated 
would be an increment instruction if one were available. 

The majority of the optimizations in pass 2 of the 
Intel PL/M compilers fall into this category. The results 
of some of the more subtle ones can be seen in Figure 17. 
It should be noted that the MVI instruction has been used 
whenever possible to perform data transfers. This instruc- 
tion requires two bytes of memory rather than the three 
required by the LXI instruction# which could also have been 
used for this purpose. Also noteworthy is the use of the 
increment and decrement instructions. 

As an indication that these kinds of optimizations 
may not be as easy to apply as it might at first appear# 
consider Figure 20. This figure shows the PDP-11 machine 
code 1153 generated for the two functionally equivalent sets 
of C language statements discussed in Section III.B. It can 
be seen that# while the compiler has used increment and 
decrement instructions in both cases# the code in Figure 
20(b) is less efficient than the other# even though it has 
been passed through the optimizer associated with the C com- 
piler. (In fairness# it should be noted that the optimizer 
is claimed to be only experimental.) This points up the fact 
that machine-dependent optimizations tend to be applied in 
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sub 
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j # r 0 


sub 


k » r 0 
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j 


mo V 


r 0 # i 



(a) 



i=(j=j+l)"k; 

i=j"k;j=j-i; 



mo V 


j # r 0 


i nc 


rO 


mo V 


rO # j 


sub 


k # r 0 


mo V 


r 0 # i 


mo V 


j # rO 


sub’ 


k / r 0 


mo V 


r 0 # i 


mo V 


j # rO 


dec 


r 0 


mo V 


r 0 # j 



(b) 



Figure ?0. PDP-11 assembly code for two 
equivalent sets of C language statements 
(a) using i nc rement /dec re'nent feature# 
(b) using addition and subtraction 



112 



an ad hoc manner^ that iSf by testing a series of conditions 
which would indicate special cases in the code being gen- 
erated. 

There is little/ if any/ mathematical rigor associ- 
ated with these methods/ and they thus are very similar to 
the kinds of optimizations which an assembly language pro- 
grammer would make. This is the major reason for the cross- 
over in relative efficiency between assembly language and 
high-level language programs as program size increases (see 
Section II. A). For the high-level language the special 
cases must be foreseen by the compiler writer. Since he 
probably will overlook some/ the compiler will generate some 
code which is obviously inefficient/ as in Figure 20(b). 
Nevertheless/ those optimizations which can be applied to 
cases foreseen by the compiler designer will be applied con- 
sistently by the compiler every time the appropriate condi- 
tions are satisfied. The assembly language programmer/ on 
the other hand/ will easily spot the kinds of inefficiencies 
shown in the example (and in the example of Figures 15-17)/ 
but he may not be consistent in applying optimizations and 
may not recognize others because of the complexity of the 
program. The inefficiencies contributed by these two fac- 
tors tend to build up rapidly as the assembly language pro- 
gram increases in size. 

2 . Archi tecture-Dependent 

Architecture-dependent optimizations are global in 
nature and depend on the architecture of the object machine 
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but not the instruction set 



Examples of architectural 



features which are considered are the number of registers^ 
the number of processinp elements/ and the degree of pipe- 
lining. As can oe seen/ these types of optimizations gen- 
erally involve resource allocation and thus would be very 
important in comoiling for m i c r oorog rammab 1 e and modular 
systems. The reason these are considered to be global op- 
timizations is that the resource requirements of diverse 
segments of a program must be considered when making the 
a 1 locations. 

The important register-allocation problem fits into 
this category/ since its solution depends on the numbers and 
types of registers available in the architcture. The par- 
ticular machine instructions are not important in this case. 
It should be recalled that poor register allocation was one 
of the major causes of inefficiency in the code of Figure 
17. Algorithmic solutions have been found for the 
register-allocation problem for simple straight-line (non- 
looping) programs 1521/ but a general solution is either not 
possible or not practical. The former is usually the case 
in programs which contain conditional branches/ since the 
flow of execution of the program is almost always unknown at 
compile time/ and this information is needed for an optimal 
solution. 1 he latter is usually true for long programs/ 
even if they contain no loops/ since an optimal solution 
would require an analysis of the entire program and an 
enumeration of all possible combinations of register 



assiqnments. Freiburghouse (20] has recently presented a 
metnod for solving the register-allocation problem which 
takes ad-vantaqe of information which can be accumulated dur- 
ing the normal course of compilation and which appears to 
give results closer to the optimum than other proposed solu- 
tions. 

As discussed in Section VI. Cr parallelism is an 
important feature in firmware systems, and the generation of 
code to take maximum advantage of this parallelism is anoth- 
er architecture-deoengent optimization problem. An impor- 
tant use of parallelism in improving execution of a program 
lies in the area of reducing the time required for iterative 
segments of code. For example, the PL/M code of Figure 21 
could oe translated into more time-efficient code if several 
arithmetic units were available than if only one were avail- 
able. 



00 I = 1 70 20; 

A = (BCD tUI + D) * 2; 
Cd) = C(I) + A; 

0(1) = D(I) - a; 
end; 



Figure 21, Iterative code for which 
parallel processing would be useful 



As is usually the case in optimization problems, 
there are tradeoffs which must be made when dealing with 
parallelism. The two methods shown in Figure 22 for 
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(b) 



Figure 22. Tree structure for serial and 
parallel computation of an expression, (a) Tree yielding 
minimum numoer of registers^ (b) Tree yielding 
maximum inherent parallelism 152/ p.21 
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calculating an expression can be used to illustrate this 
point. If code were being generated for a machine with only 
one register the scheme in Figure 22(a) would be better than 
tnat of Figure 22(o)f while the reverse would be true for a 
machine with four multipliers and four registers. For a 
machine with fewer than four multiplierSf though/ it is not 
obvious which method would be better. In such situations an 
analysis must be made of the various types of instructions 
involved. One way to do this would be to assign weights to 
the instructions based upon their execution times (e.g./ a 
multiplication instruction would have a greater weight than 
an addition instruction) and then generate the code which 
achieved the minimum total weight for the desireo 
comput at i on . 

3 . Archi tecture - Independent 

The final and most general category consists of the 
a r c h i t ec t u r e ” i ndependen t optimizations. Since these do not 
depend on the architecture or the instruction set of the 
object machine/ they are obviously applicable to compiling 
for use r-de f i nab I e architectures. These kinds of optimiza- 
tions can be applied to the intermediate language code 
without considering the hardware features available. As in 
the case of architecture-dependent optimization/ these op- 
timizations are global in nature. The most commonly applied 
techniques in this class are common subexpression elimina- 
tion/ dead variable elimination/ code motion/ and constant 
propagation. Since these techniques are widely discussed in 
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t^l for a good survey and 113] 



the literature (see/ e.g./ 
for an application) only a brief description is presented 
here. 

Connmon subexpression elimination is the most widely 
emoloyed technique 152]. Basicly it is concerned with 
avoiding redundant computations/ such as for the second 
occurrence of in Figure 23(a). Dead variables are 

those which/ beyond a given statement/ never again appear on 
the right-hand side of an assignment or are never again 
referenced. In the first case the variable need not be kept 
in a high-speed register/ and in the second case it need not 
any longer be assigned any memory at all. Code motion 
refers to the movement of sections of code so as to reduce 
the execution time of a program. For example/ the section 
of code shown in Figure 23(b) would be significantly im- 
proved if the assignment to "D*' were moved outside of the 
looD. Constant propagation is really a special case of code 
motion/ since calculations involving only known constants 
are moved from the execution phase of a program to the com- 
pilation phase. The computations of "C" and "D” in Figure 
23(c) provide examples of propagated constants. 

Architecture- independent optimization techniques 
rely heavily on theoretical work and are amenable to the 
apolication of sophisticated algorithms. They usually in- 
volve a global flow analysis of the intermediate form of the 
program and may rely on graph theory or matrix analysis. 
Unfortunately most of these techniques are very complicated 
and require large amounts of memory and time. 
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I 



A=B*c+o; 

Q = D + R; 

X = p + B * C; 



(a) 



DO I = 1 TO 1000; 
A(I) = 8(1) * C(I); 
0=X*Y/Z+50; 
B(I) = C(l) * 0; 

end; 



(b) 



A = 3; 

B = c * d; 

C = A +5; 

D = A * C + a; 



(c) 



Figure 83. Architecture” independent 
optimization candidates, (a) Common subexpression^ 
(b) Code motion/ (c) Constant propagation 
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Since a useful program usually contains 



many 



branches and loops which make it impossible to know at com- 
pilation time how often (if ever) many sections of code will 
be executed/ frequency analysis is sometimes employed in the 
optimization process. By assigning a relative frequency or 
weight to each block of code in a program/ the programmer 
allows the optimizer to perform a Monte Carlo simulation to 
determine the "optimum" code sequence (52). There have even 
been proposals (2^1 to employ an adaptive optimization pro- 
cess to perform the optimizations at run time. In such a 
scheme a large portion of the effort would be devoted to 
optimizing sections of code which are heavily used/ since 
they account for most of the execution time. Such a scheme 
probably would not be practical for most real-time systems 
unless the adaptive optimization were done during the 
development process and the resulting optimizations were 
applied to the final system in a non-adaptive mode. 

C. APPLICATION 

From the discussion in Sections VII. A and VII. B it can 
be seen that compiler optimization is a complex problem. A 
good optimizing compiler/ in effect/ attempts to match wits 
with a good assembly language programmer. In order to do 
this effectively the compiler must have a great deal of 
"artificial intelligence" built into it/ and this is some- 
thing which/ unfortunately/ is difficult to do. "Optimiza- 
tions originating in the academic and scientific community 
tend to be global/ while/ until recently/ manufacturers have 
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concentrated on local and machine-dependent techniques. ** 
[52/ p.lJ More efficient algorithms must be developed in 
order to allow the academic solutions to become more useful 
in practical compilers. More general and powerful tech- 
niques for handling local and mac h i ne-deoenden t optimiza- 
tions must also be found. For the types of systems under 
consideration here f many of the techniques discussed above 
are already oractical/ since compilation costs are only a 
small part of total development cost. 

A great deal of care must be exercised in the applica- 
tion of optimization techniques in code generation. Many of 
the techniques involve reordering of arithmetic operations/ 
and this can lead to unexpected and often undesired results 
(e.g./ from a numerical analysis point of view). Thus it 
appears that a great deal of work remains to be done in this 
area. It is evident/ though/ that as better techniques are 
developed and the cost of current techniques (in memory and 
time) are brought lower/ high-level languages will continue 
to become more attractive. 

Until some breakthrough comes in the artificial intelli- 
gence area the most practical techniques will probably 
require the programmer to Provide some input to the optimi- 
zation process. He might specify that speed is most impor- 
tant for certain sections of code and that the amount of 
memory utilized should be minimized for other sections. He 
might also specify the probabilities of certain branches in 
the program (as has been done in some compilers since the 
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1950's (52J). The computer then will perform 

work" much more effectively than a human writing 
1 anquaqe . 



the "dirty 
i n assemb 1 y 







I 



I 



VIII 



THE CONFIGURATION- INDEPENDENT COMPILER 



The previous seven chapters have discussed features 
availaole in current compilers and features which appear 
feasible for future compilers. In this chapter an attempt 
is made to tie togetner some of these ideas and discuss the 
possible functioning and structure of a compiler for which a 
target machine and language are not necessarily specified 
prior to compilation. The level of interest in developing 
such a compiler is indicated by the increasing amount of 
work being done on machine-independent high-level micropro- 
gramming and system programming languages II 8 » 38 f ^0 » ^ 7 f 571 . 

Ramamoorthy and Tsuchiya 1^71 have demonstrated a 
language which appears to have many of the desired features 
and which can oroduce control code for a complex micropro- 
grammable machine. Their SIMPL (Single Identity Micropro- 
gramming Language) is intended to be mac h i ne- i ndeoendent » 
however^ it does not appear that they have yet addressed the 
problem of specifying the machine organization to the corn- 
oiler in a flexible manner. 

itVilcox (61) has looked at the latter problem but has 
based his work on the concept of a mac h i ne- i ndependen t as- 
sembler. This assembler is to be used for generating con- 
trol code for digital systems built with QED functional 
modules (5b). Ihe nature of this problem is very similar to 
that considered by Ramamoorthy and Tsuchiya^ and there does 
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not seem to be any practical reason for not extending 
/'Wilcox's concept to a mach i ne- i ndependen t compiler. 

A, THL IDEAL COMPILER 

The truly ideal compiler would be one which would accept 
an aloorithm from the programmer in a universal programming 
language# select the most appropriate hardware for the job# 
and produce the code for controlling the hardware. Obvious- 
ly the compiler would require more input than just a state- 
ment of the algorithm* It would need to have information on 
what hardware was available and the operational constraints 
to be placed on the resulting system. 

A compiler which could function as described above is a 
goal for which compiler designers can strive# but it is one 
which will probably require many more years to achieve. The 
reason for this is the reason that computers have not taken 
over all other engineering d i sc i p 1 i nes-- t he r e are too many 
subtle tradeoffs to be made in designing a system. The 
relationships oetween many of the variables involved cannot 
be quantified# and a great deal of experience and intuition 
is requi red to produce a good design. A large part of any 
design effort is concerned with optimization of some sort# 
and# as discussed in Chapter VII# this involves the area 
which the computer scientist labels artificial intelligence. 

Short of the ideal# the programmer (system designer) 
will have to specify a few possible hardware configurations 
along with the ootimization functions and constraints. The 
compiler will then make some simple tradeoffs among the 
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various c on f i gu ra t i ons f choose the "optimal” oner and pro- 
duce the optimal code. In a given designr for exampler the 
compiler might decide that a system with three multiplier 
modules would be better than one with two or four multiplier 
modu 1 e s . 

It will probably be several years before even this re- 
duced capability compiler can be implemented. Based upon 
what appears feasible within the next few yearsr the "ideal” 

compiler would be even more restricted. As indicated in 

« 

Section I. A this compiler would have several inputs. In 
addition to the algorithm^ the programmer would specify the 
hardware configuration, the format of the control code, and 
some simple optimization information. A conceptual block 
diagram of such a compiler is shown in Figure 2^. In actual 
practice it may be difficult to divide the compiler into a 
set of neat boxes with definite flow of action, a fact which 
is suggested bv the dashed line in Figure 2^, In other 
words, there will probably be a strong interaction among the 
various sections of such a compiler. 

It will probably be especially difficult to distinguish 
the a re h i t ec t u re~deoenden t optimization phase from pass 2. 
These two phases relate fairly closely to the final two 
steps in a SIMPL compi 1 at i on--t he concurrency and timing 
analysis step and the microoperation timing optimization 
steo, (The first two steps are syntactic analysis and se- 
mantic analysis, which parse the source program and break it 
into a series of subblocks,). The concurrency and timing 
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Source 

Lanauage 




Control Code 



Figure 2^. Conceptual diagram of a compiler 
for use r-de f i nab 1 e architectures 
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analysis step "... examines symbolic code in each subblock 
to detect concurrently executable microoperations and to 
determine their feasible execution timing." [^7» p.796) 
Mextf the m i c roope r a t i on timing optimization step "... in- 
troduces complete machine dependence.... The hardware or- 
ganization and operating characteristics are defined by the 
microinstruction definition that is represented internally 
in the compiler." 1^7/ p.7971 

B. lUTRODUCIiMG MACHINE DEPENDENCE 

Probably the most difficult problem which will be en- 
countered in designing a compiler for user-definable archi- 
tectures will be that of introducing machine dependence. 
Compilers for fixed architectures have machine-dependent 
information scattered through all of their phases. The 
con f i gu ra t i on- i ndependen t comDiler» on the other handr must 
have machine-dependent information localized to as few areas 
as possible and must be structured in such a way as to make 
it as easy as possible to change this information. It is 
because of the fact that machine dependence has to be intro- 
duced at some stage in any practical compiler that the term 
"compiling for user-definable architectures" has been used 
in this thesis. The technically inaccurate term "machine- 
independent compiler" is often encountered and has the same 
meaning. 

As indicated in Figure 2 .^, information on optimization, 
machine organization, and instruction formats will be tabu- 
lated by the compiler. After suitable processing, the 
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information will be loaded into tables in much the same way 



that the information from the algorithm is loaded into the 
symbol table. The architecture-dependent optimi7ation phase 
and pass 2 will thus be " t ab 1 e-d r i ven" ; i.e.r they will 
extract information from the various tables and use this 
information to make optimization and code generation deci- 
sions. In the sense that they use information from the sym- 
bol table to generate control coder the second passes of the 
two current Intel PL/M compilers for the 6008 and the 8080 
microprocessors can be considered to be partially table- 
dri ven. 

Tirrell (571 has reported work involving the use of a 
table-driven compiler for microprogramming. In his com- 
pilerr tables containing machine-dependent information could 
be loaded prior to compilation or could be generated during 
compilation. One table was used for indicating the status 
of the various hardware registers and indicators^ while a 
second table was used to store the basic microinstruction 
patterns. Other tables were used as aids in optimizing the 
generated code. Most of the optimizations involvea the 
arrangement of elementary operations into efficient microin- 
struction words (i.e.r words which take maximum advantage of 
parallelism). 

Another concept which deserves attention in the design 
of a compiler for user-definable architectures is that of 
dec i s i on- 1 og i c tables (S^^^l. Initially conceived to re- 
place flow charts in business programming applications/ 
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decision tables have developed an extensive body of theory 
to enable their efficient use. A decision table consists of 
a group of alternatives for a given situation and a set of 
actions to be taken for each alternative. In essence^ this 
technique results in a tabular program rather than a table- 
driven program. 

In his discussion of r eg i s t e r- t r an s f er languages^ Bar- 

bacci concluded with some very pertinent remarks. 

The pro I i f erat i on of machines introduces a problem in the 
Production of software. Standard languages (Fortran, 
Cobol/ etc.) have alleviated the problem with respect to 
the user side but there still remains the variability on 
the machine side. Compiler writing is an expensive task 
and automatic programming systems (como i 1 e r-comp i 1 e r s ) 
have not taken into account this variability on the target 
machine. If we expect to solve the problem we need 
comp i 1 e r-como i 1 e r s that accent as inputs both the languaae 
description (syntax and semantics) and the target machirfe 
description. None of the existing hardware design 
languaaes is useful in this problem and the issue is not 
just the production of code, but of "good" code .... 
(6 , p. 1 U81 
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IX 



CONCLUSIONS AND RECOMMENDATIONS 



As digital large scale integrated circuits and function- 
al modules continue to have a greater imoact on electronic 
system design^ the need for improved software design and 
application will become ever more critical in producing 
reliable/ cost-effective systems. Most of the concents dis- 
cussed in this thesis have been in existence for a number of 
years/ but current hardware development trends demand that 
greater emphasis be placed on translating these concepts 
into realities. 

One of the key milestones in the effort to provide 
better tools for the design of systems using the new com- 
ponents will be the development of suitable high-level pro- 
gramming languages for describing the algorithm. There are 
well over 100 high-level languages available today/ each 
desioned to help solve a particular problem. "... tO] ne may 
question the need or desirability of all these languages. 
Un the other hand/ for the convenience of the user/ he 
should be allowed to choose a language that he is comfort- 
able with and which best suits his application." 1^6/ p.3] 
The E'L/M language/ developed by Intel Corporation/ has been 
successfully used by firmware designers and may be able to 
be used as a base for new/ more comprehensive languages. 
Even if completely new languages are developed/ they will 
probably bear a strong resemblance to PL/M. 
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f^ajor effort will have to be directed toward the 
development of compiler facilities which allow user specifi- 
cation of the hardware aspects of his design. This will 
require the development of a good hardware description 
language/ which may or may not be a subset of the language 
discussed above for describing the algorithm. In any event/ 
the compiler will have the capability of manipulating this 
hardware information in such a way as to facilitate the gen- 
eration of control code. 

In order for this compiler to be accepted by system 
designers/ it will have to generate "good" control code/ 
with the specification of goodness being provided by the 
user. Thus there is a need for the continued development of 
practical compiler optimization techniques. In all of the 
work to be done/ the optimization problems will probably be 
the most difficult to solve and the most crucial for the 
success of the overall task. 

The work discussed in Chapter IV has been sufficient to 
indicate the feasibility of developing a high-level lanquaae 
for user-definable architectures; however/ there are many 
questions left to be answered and several important steps 
whicn need to be taken. The development of a formal tech- 
nique for describing the semantics of a programming language 
should have a high priority in this regard. Despite all of 
the thoretical work which has been done to improve the syn- 
tax analysis and par s i ng processes in compilers/ very little 
has been done to formalize the semantic analysis and code 
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generation orocesses [30^60J. The code generation process 
for the PL/i'l co^ipiler is just another translation (from 
intermediate language to machine language) and might be able 
to be performed in a manner similar to the parsing of the 
source language and generation of intermediate language 
code. The use of a push-down automaton for this process 
should be investigated. 

Error recovery during source language parsing is another 
area which deserves additional attention. It is desirable 
to provide the programmer with as much information as possi" 
blef and the method discussed in Section IV.b.il is relative- 
ly simple. Attention in this area should also be oevoted 
toward more efficient storage of error messages in order to 
help minimize the size of the compiler. One technique for 
doing this would involve the design of messages which can be 
partitioned into a relatively small number of common 
ohrases. Detailed messages could then be constructed from 
these phrases. I 

The next step in continuing the work described in 
Chapter IV should be the design and implementation of a 
second pass for the PL/M compiler. Several soecific recom- 
mendations can be made here for future work in this area. 
Firstf a routine will have to be written to transfer the 
symbol table to a disk file. This file/ along with the 
intermediate language file and the initial value file/ would 
then be used as input for the second pass. The key to suc” 
cessful development of a "machine-independent" second pass 
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will be the availability of a suitable hardware description 
language and compiler. When they become availble^ they 
should be tested by using them to write a description of the 
Intel b008 or tiOBO microprocessor. In order to produce the 
control code» routines will then have to be written to store 
the necessary information from this deschiption in tables 
and to manipulate these tables according to the information 
received from pass 1. Until a suitable hardware description 
language is available# a more conventional pass 2 could be 
written# in the C language# for the PL/M compiler. This 
would provide a vehicle for testing various optimization 
technigues. Finally# optimization inputs must be defined# 
and the methods for utilizing them must be developed. 
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APPENDIX A 

PL/H INTERMEDIATE LANGUAGE CODES 



Prefixes 

ADR Load Address of Symbol 

LIN Line Number .'barker 

LIT Load Literal Value 

OPR Stack Operator 

VAL Symbol: Load Value 

Procedure: Load Address 

Operators 

ADC Add with Carry 

ADD Add 

AND Logical And 

ARG Procedure Argument 

AXl Auxiliary I 

AX2 Auxiliary 2 

AX3 Auxiliary 3 

BIF Built-In Function 

CSE Case Index Operation 

CVA Convert to Address (Double Byte) 

DAT Data Start / Finish 

DEL Delete 

DIS Disable Interrupts 

DIV Divide 

DRT Default Return (End of Procedure) 

ENA Enable Interrupts 

ENB Enter Block 

END End of Do Grouo 

ENP Enter Procedure 

EQL Test for Equal 

GEQ Test for Greater Than or Equal 

GTR Test for Greater Than 

HAL Halt 

HIV Extract High Order Byte 

INC Increment 

INX Subscript Index 

lOR Logical Inclusive Or 

LEO Test for Less Than or Equal 

LOD Load 

LOV Extract Low Order Byte 

LSS Test for Less Than 

MUL ..... Multiply 

NEG Negative 

NEO Test for Not Equal 



iM OP No Operation 

NOI Logical Negate 

OPG Origin 

PRO Procedure Call 

REM ..... Rer'ainder 

RET Return 

RTL Rotate Left 

RTR Rotate Right 

S3C Subtract with Carry 

SFL Shift Left 

SFR Shift Right 

STD Store Destructive 

STO Store 

SUB Subt rac t 

TRA Unconoitional Transfer 

TRC Conditional Transfer 

XCH Exchange 

XOR Logical Exclusive Or 
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APPENDIX B 
PROGRAM LISTINGS 



FILE: m.qram 

PL/M Syntax and Semantics 



%term identifier number string 

%{ /* declarations used by actions and programs */ 

int ii/ij; char *kk,tt» 

^include "m«def*' 

# i nc 1 ude "m . dec 1 *' 



%% 



/ * beginning of grammar rules section */ 



program: st atement 1 i st /* 1 */ 



St at ement I i St : statement /* 2 */ 

I s t a t emen t 1 i s t statement /* 3 



*/ 



statement: 



bas i c s t a t emen t /* ^ */ 
{noush = 0; ) 

ifstatement /* 5 */ 
{npush = O; } 



I • • 
9 



/* 6 */ 



basicstatement: assignment 

= {while ($1--) 

{if (fixvispj > 0) emi t (OPR, XCH) ; 
else {set sy ( syml oc (so] ) ; 

emi t (AOR,getsyno() ); > 
pop ( 1 ) ; 

if ($1 > 0) emi t (0PR,ST0); 
else em i t ( OPR , S T 0 ) ; > > 

group ' ; ' /* 7 */ 

proceduredefinition 
returnstatement /* 

c a 1 1 s t a t emen t /* 

go t os t a t emen t /* 

dec 1 arat ionstatement ' ? ' 
•halt' • ; • /* 13 */ 

= {emi t (OPR, HAL) ; } 

'enable' /* 1^ 

= (emi t (OPR, ENA) ; } 

'disable* /* lb 

= (emi t (OPR,DIS) ; > 



*/ 



*/ 



/* 8 

9 */ 

10 */ 
11 */ 

/* 12 



*/ 
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; ' ; • /* 16 */ 

I 1 aoe 1 do f i n i t i on bas i c s t a t emen t /* 17 */ 

! error /* ERROR */ /* 18 */ 

= lerrfix();> 

f 

ifstatemcnt: ifclause statement /* 19 */ 



1 

• 


< em i t ( DEF , spop ( ) ) ; > 

ifclause truepart statement /* 20 */ 



= { em i t ( OEF / spop ( ) ) ; } 

! labeldefinition ifstatement /* ^1 */ 

f 



i f c 1 ause : 


'if' exoression 'then' /* 22 */ 

(emi t (VAL r spusn( nsym + + ) ) i 
emi t (OPR/ TRC) ; ) 


t r uepa r t : 


basicstatement 'else' /* 23 */ 

{ i i = spop ( ) ; 
emi t (VAL/ spush(nsym++j ) ; 
emit (OPR/TRA); 
em i t ( DEF / i i ) / > 


f 

aroup; 


grouphead ending /* 29 */ 

lexitblk(); 

if (S2 >= 0) {flag(" identifier invalid here"); 

POP ( 1 ) ; } 
switch($l & 03) 

(case 0; /* simole grouD */ 

em i t ( OPR / END) / break/ 
case 1; if (SI 8, 09) /* stepdef w/6Y */ 

{emi t (VAL/ soop( ) ) ; 

emit(OPR/TRA); emit(DEF/Spop());> 
else /* stepdef w/o BY */ 

<emit(VAL/ii = symfind(sp))/ 
emi t (OPR/ INC) ; 

em i t ( ADR / i i ) / em i t ( OPR / STD ) ; 
ii = sooo(); em i t ( V AL / spoo ( ) ) ; 
em i t ( OPR / TRA ) ; em i t ( DEF / i i ) ; ) 
pop( 1 ) ; break; 

case 2: /* while group */ 

ii = spoof); emi t ( VAL/ spop( ) ) ; 
em i t ( OPR / TRA ) ; em i t ( DEF , i i ) ; break; 
case 3: /* case group */ 

ii = spop(); spopO; jj = SI >> 2; 

kk = csp + (jj << 1); 

emi t (DEF/maketwo(*kk/*(kk+l ) ) ) ; 

emi t (OPR/CSE) ; 

while ( j j - - ) 

{kk =- 2; 

emi t (VAL/maketwo(*kk/*(kk+l)) ) ; 
emit (OPR/ AX2) ; > 
em i t ( DEF / i i ) ; 

for (jj = (ii >> 2) + i;jj*-;) soop(); 
break ; } ) . 



f 
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grouphead: 'do* /*25*/ 

= {enterPlk(); em i t ( 0P^^ » ENB ) » S$ = 0» ) 

! 'do' s t eodp f 1 n i t i on ';' /* 26 */ 

= {enterolk(); SS = t+$2» > 

' 'do' whileclause /* 27 */ 

= { ent e ro 1 k ( ) ; $$ = 2; > 

I 'do' caseselector ';' /* 28 */ 

= {enterb1k(); $S = 3; 

emitCVALfSpush(nsym++)); emit(OPR/AXl); 
em i t ( OtF / SDush ( n s y m+ t ) ) ; soush ( n sym+ + ) ; } 

I groupheadstatement /*29*/ 

= (i f ( ( = $1 ) & 03) == 3) 

{em i t ( VAL f i i = spopO)? em i t ( OPR f T RA ) ; 
emi t (DEFfSPush(nsym++) ); 
spush ( ii ) ; i>$ =t 4; } } 

} 

step definition: variable replace expression iterationcontrol 

/* 30 */ 



i$$ = $4; } 



iterationcontrol: to expression /* 31 */ 

= {$$ - O; emit(OPR,LEQ); 

em i t ( V AL / spu s h ( ns ym+ + ) ) ; em i t ( OPR f TRC ) ; ) 

I to expression by expression /* 32 */ 

- {emit(VAL/ii = symfind(sp))/ emit(0PR»ADD)? 
emi t ( AOR, i i ) ; em i t ( OPR , STD ) ; S$ = 4; 
emit(VAL/Spop())/ emit(OPR^lRA); 
em i t ( DEF » spop ( ) ) / ) 

f 

whileclause: while expression /* 33 */ 

= {emit(VAL/Spush(nsym++)); 
emi t (OPR, IRC) ; > 

f 

caseselector: 'case* expression /* 34 */ 

f 

p r oc edu r ede f i n i t i on : procedurehead st at ement 1 i st ending 

/* 35 */ 

= (if (i3 < 0) flag(" identifier required")? 
e 1 se { se t sy ( S 1 ) ; ii = getsynoC); 
if (ii != symfind(S3)) 

f 1 ag ( " i ncor r ec t identifier"); 
pop ( 1 ) ; > 

exitblk(); em it ( OPR , END) ; 
emit(0PR,0RT); emit(OEF,spop()); > 

9 

procedurehead: procedurename ';' /* 36 */ 

= (procode($S = SI); > 

' procedurename type ';' /* 37 */ 

= (setsy(SS = $1); setprec(S2); procode(Sl); > 

I procedurename parameter 1 i st '; ' /* 38 */ 

= (setsy(i$ = SI); setlen($2); procode(Sl); ) 

! procedurename parameter 1 i st tyoe ';' /* 39 */ 

= <setsy(S$ = SI); setlen(S2); 
setprec(S3); procode(Sl); ) 
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p roc edu renarre •interrupt* number 
= {procode($i = SI); } 



I • • 
9 



/* ^0 */ 



procedurename : identifier 'procedure' /* */ 

= (if (symloctill >= curlev) 

flaq("il legal procedure name")/ 
fixvlil] = O; S3) = sytop; 
enter(fl/prot/0,0); compress(S$/l); 

POP(I); emit(0PR/ENP); 
ent e rb 1 k ( ) ; } 

; 

pa r ame t e r 1 i s t : paramet erhead identifier ')' /* ^2 */ 

= < i f (acnt >= 63) 

fflagC'too many parameters")) acnt = 62)) 
setsy(Sl); 

fixvlS2] = Of IS = ++acnt + (getlastC) << 6); 
enter(S2fundeff0f 1); compressCSl/acnt); 

POP ( I ) ; > 



pa r ame t e r head ; '(* /* ^3 */ 

- {SS = sytoOf acnt = 0) > 

! pa ramet e r head identifier '/' /•* ^1^ */ 

= {$S = Si; acnt++; fixv(S2) = 0; 

enter(Si?fundeffOf 1); 
pop ( 1 ) ; > 

; 

ending; 'end' /* ^5 */ 

= fSS = -i; > 

! 'end' identifier /* ^16 */ 

= {$s = S2; } 

I 1 abe 1 de f i n i t i on ending /* ^7 */ 

= {$S = S2f > 



labeldefinition; identifier ';' /* ^8 * / 

~ {labflag++; 

if ((ii = symloctSlJ) >= curlev) 

{ set sy ( i i ) ; 
if (getlenf)) 

{ii = getsizeC) + finfo + 1; 
♦(symbol + ii) =S 03) 
emi t (DEF /getsynof ) ) ) 

) 

else flag("label redeclared"); 

> 



else 

{f i XV (sn = 0; 

SS = sytop; 

enter(Slf labt/O/O); compress(SSfl); 
emi t (DEF/nsym-l ) ; 

> > 

number /♦ */ 

= (emitaiTfSl); emi t (OPR, ORG) ; > 



returnstatement: 'return' / * 50 */ 
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= iemi t (LIT,0); em i t ( OPK , RE. T ) ? } 

5 'return'expression /*51*/ 

= <eni t (OPR,RET) ; > 

f 

c a 1 1 s t at emen t : 'call' variable /* 52 */ 

= < se t sy ( sym 1 oc [ $2 J ) ; em i t ( VAL r ge t syno ( ) ) J 

if ((ii = gettype()) == prot ) em i t ( OPR / PRO ) ; 
else {if (ii == cprot) em i t ( OPR ^ B I F ) ; 

else f 1 ag ( " va r i ab 1 e not a procedure name")/} 

POP ( 1 ) / } 

f 

got os t at emen t : goto identifier /* 53 */ 

= {emit(VAL/Symfind(S2))/ emit(OPR/TRA); pop(l)/ } 
I goto number /* 5^ */ 

= {emit(LIT,i2); em i t ( OPR , TR A ) ; } 

f 

goto: 'go' 'to' /* 55 */ 

! ' got o' /* 56 */ 

f 

dec 1 a rat i on s t a t emen t : 'declare' dec 1 a rat i one 1 ement /* 57 */ 

! dec 1 a ra t i ons t a t emen t '/' dec 1 a r a t i one 1 ement /* 56 */ 

} 

dec 1 arat i one 1 ement J t ypedec 1 a ra t i on /* 59 */ 

= { se t sy ( c u r 1 e V - *curlev); 

if (gettype() == prot) 

(ii = getlenC); 
while ( i i -- ) 

{ set sy ( symbo 1 + finfo + getsize() + 2); 

if (gettypeC) = = vart && getorec() == 0) 
setprec ( i 1 ) ? 

} 

> } 

I identifier 'literally' string /* 60 */ 

= { 

/* enter a macro definition */ 
symbol = sytopJ 

set name ( S 1 ) / /* fills s i ze / name / hco 1 1 */ 

fixhcol 1 (); 
sett ype ( mac t ) ; 

/* set the macro definition size */ 
setsym((ii = getsize() + finfo)/ 
getvarc(jj = var(S3)))/ 

se t c h a r ( + + i i' / j j ) / /* fills the macro definition */ 

/* note that last field filled is at end of entry */ 
s y f i n ( ) ; 

POP ( 2 ) / } 

} identifier datalist /* 61 */ 

= { i f ( symc heck($l)) 

{fixv($lJ = O; ii = sytoo; jj = ($2 > 63)J 
eriter(Sl/ jj ? Ivect : svect/l/$2);} 
if (Ijj) compress(ii/l); 
else {setsy(ii); fixhcolH);} 

POP (1); emit(0PR/0AT); emit(DCF,spop()); > 



1^0 



I 



I 








datalist: datahead constant ')' /* 62 */ 

- {iS = 11 + concede ( 42/ dat con ) ; > 

9 

datahead: 'data* '(' /* 63 */ 

. r {$$ r o; em i t ( V AL » spus h ( n s ym++ ) ) ; 
emi t COPK, TRA) ; emi t (OPR, DAT) ; 
emi t (DEF , nsym) ; > 

I datahead constant /* 6^ */ 

r ($s = $1 + concede ( 42 , dat con ) ; > 

9 

t yoedec 1 a rat i on : i dent i f i erspec i f i cat i on type /* 65 */ 

- (14 = 12; i i = 11 ; 

if (12 1= 1) c hange ( va r t , 12 , 1 , acn t ) ; 
Compress(41 ,acnt) ; } 

I boundhead number ')' type /* 66 */ 

- (if (114) f 1 ag ( " i 1 1 ega 1 declaration"); 

1$ = $^j; ii = $i; 

if ($2 > 63) c h ange ( 1 vec t , 1^1 , $2 , acn t ) ; 
else ( c hange ( svec t , 1^ , 12 , acn t ) ; 
compress(ll,acnt); >} 

! t ypedec 1 a rat i on initiallist /* 67 */ 

= ( 1 5 = 1 1 ; } 

tyoe: 'byte' /* 68 */ 

= (11 = tt = i;> 

* 'address' /* 69 */ 

= (11 = tt = 2; ) 

; 'label' /* 70 */ 

= (ii=tt=o;> 

9 

boundhead: i dent i f i ersoec i f i cat i on '(' /* 71 */ 

= ($1 = li;} 

i dent ifierspecifi cation; variablename /* 72 */ 

= (IS = Si; if (jj) aent = l; else aent = 0; } 
} identifierlist variablename ')' /* 73 * / 

- (if (acnt++) 41 = li; else 11 = 12;} 



identifierlist: *(' /* 7 ^ */ 

= (aent = 0; } 

I identifierlist variablename ',' /* 75 */ 

= (if (acnt++) 11 =$i; else 1$ - 12;} 

f 

variablename: identifier /* 76 */ 

= (if (symcheckdl)) 

(f i XV (111 = 0; 

1$ = sytop; 
enter(ll,vart,l,l);} 

POD ( 1 ) ; } 

I bas edv a r i ab I e identifier /* 77 */ 

= (if (fixvll21 != foundv) 

flag("base not aefined"); 
else (ii = getsyno(); 

setsy(ll); setbsyno( i i ) ; } 



lai 



S ii = i 1 ; 

POD ( 1 ) ; > 

oaseovariable: identifier 'cased' /* 78 */ 

. = {if ( symc hec k ( 1 ) ) 

{ f i X V li 1 ] = basev ; 

S$ = sytop; 
enter(Sl/vartr 1/ 1);} 

POP ( 1 ) ; > 

f 

initiallist: initialhead ccnstant ')' /* 79 */ 

= {$$ = SI + concode($2»tt3; 

if ( 1 1 J 

{ pu t w ( f &bu f 3 ) ; setsy(ii); 

DU t w ( ge t s y no ( ) f &bu f 3 ) ? } > 

f 

initialhead: 'initial' '(' / * 80 * / 

= {$5> = 0; if (Itt) 

flag(" initial not allowed here")? > 

S initialhead ccnstant /* 81 */ 

= {$5 = $1 + concede ( $2» t t ) ; > 

? 

assignment: variable replace expression 

= {iS = l; > 

I leftpart assignment 

— {}>3> — + + $2? ) 

? 

replace: ' = ' /* */ 

f 

leftpart; variable /* 85 

f 

expression: 1 og i c a I expres s i on /* 86 */ 

I variable ':' ' = ' 1 og i c a 1 e x pres s i on 

= {if (fixvtill) emi t (OPR, XCH) ; 
else em i t ( ADR , sym f i nd ( J 1 ) ) ; 
emit(OPR,STO); pop(l); > 

t 

1 og i c a 1 e X p r e s s i on : 1 og i c a 1 f ac t o r /* 88 

,' 1 ogi ca 1 express i on 'or' 1 og i c a 1 f ac t o r 

'= {emi t (OPR, lOR) ; > 

I 1 ogi ca 1 express i on 'xor' 1 og i ca 1 f ac t o r /* 90 */ 

'= (emi t (OPR, XOR) ; } 

t 

logicalfactor: logicalsecondary /* 91 * / 

! logicalfactor 'and' 1 ogi ca 1 secondary /* 92 */ 

= { em i t ( OPR , AND ) ? ) 

9 

logicalsecondary: logicalprimary /* 93 */ 

J 'not' logicalprimary /* 9^ */ 

= (emi t (OPR, NOT) ; ) 

1 oo i ca 1 pr i ma r y : a r i t h me t i c e x p res s i on /* 95 */ 

! a r i t hme t i c e xo re s s i on relation ar i thmet i cexpress i on/* 96 */ 
= (emit(0PR, $2); > 



/* 82 */ 
/* 83 */ 

*/ 

/* 87 */ 

*/ 

/* 89 */ 



t 



I . I 



/* 97 */ 



relation: 



- 


(i$ = lol; 


} 










98 */ 










- 


($5 = LSS; 


} 








* > ' /★ 


99 */ 










z 


{iS — GTRf 


> 








1 < 1 1 > 1 


100 */ 










= 


{$$ = NEQ; 

( 


} 

1 < 1 1 • 1 


/* 


101 




= 


i'SS — LEO/ 

( 


} 

1 > 1 < — 1 


/* 


102 




r 


{$$ = geq; 


} 









a r i t hme t i cexD res s i on : term /* 105 */ 

5 a r i t hme t i ce xpr ess i on ' + ' 

= lemi t (OPR, ADD) ; } 

I a r i t hme t i ce X pr ess i on 

= (emi t (UPR,SUB) ; > 

1 a r i t hme t i c e X p r e s s i on 'plus' term /* 

= (emi t (OPR, ADC) ; ) 

! a r i t hme t i cexp r ess i on 'minus' term /* 

= (emi t (0PR,SBC) ; } 

' '"' term /* 108 */ 

= (emi t (OPR,NtG) ; ) 



term /* 109 */ 
term /* 105 */ 
106 */ 

107 */ 



term: primary /* 109 */ 



term 


'*' primary /* 


no 


*/ 


r 


(emi t (OPR, MUD ; 


} 




term 


'/' primary /* 


1 1 1 


*/ 




(emit (0PR,DI V) ; 


} 




term 


'mod' primary /* 


112 


*/ 


z 


(emi t (OPR, REM) ; 


} 





primary: constant /* 113 */ 

= (if (conlast == stringc) 

(ii = vartsp] + 1; 
switch (il) 

(case 1: emi t (LIT,getvarc( i i ) ) ; break; 
default: 

flagC string must be 1 or 2 chars") 
case 2: 

emit(LlT,maketwo(qetvarc(i i + 1), 
get varc ( i i ) ) ) ; ) 

POD ( 1 ) ; 

> 

else emi t (L I T, $ 1 ) ; > 

1 '.' constant /* 119 */ 

= ( em i t ( V AL , spush ( n sy m+ + ) ) ; em i t ( OPR , T R A ) ; 

em i t ( OPR , D A T ) ; emit(OEF,0); 

Concode($2,pricon) ; 

em i t ( OPR , DA T ) ; em i t ( DEF , spop ( ) ) ; ) 

! constanthead constant ')' /* 115 */ 

= (concode ( 52 , pr i con ) ; 
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em i t ( OPW > D A T ) ; em i t ( DEK / spop ( ) ) J > 

variable /* 116 */ 

= lii = symfind(Bl); 

if ( i i >= 0 ) 

{swi tch(qettype()) 

(case prot: emi t ( VAL / i i ) ? em i t ( OPH / PRO ) 
break ; 

case cprotJ em i t ( V AL » i i ) J emi t ( OPR » B I P ) 
break ; 

default; i f ( ! f i x v 1 ] ) em i t ( V AL / i i ) ? 
else emit(OPR/LOD)f> > 

pop ( 1 ) ; > 

variable /* 117 */ 

~ {if (IfixvtS?]) em i t ( ADR , s ym f i nd ( $2 ) ) ; 
emi t (UPR, CVA) ; pop(l); > 

'(* expression ')' /* 118 */ 



c on s t an t h ead ; '(' /* 119 */ 

= {emit(VAL/Spush(nsym++)); emit(OPRfTRA); 
emi t (OPR, DAT) ; emit(D£F,0); } 

J constanthead constant /* 120 */ 

= {concode ( i2f pr i con) ; > 

*9 

variable: identifier /* 121 */ 

= {undecO; fixvtSS = $11 = O; > 

! subsc r i ot head expression ')' /* 122 */ 

= <ii = symfind(il)/ ++fixvl$$ = $1); 

if ((jj = gettypeC)) != prot jj 1= cprot ) 
emi t (OPRf INX) ; 
else e^n i t ( OPR f ARC ) ; > 



9 



subsc r i pt head ; identifier '(' 


/* 123 






r 


lundecC); fixvC$$ = $1) 


= o; 


1 i 


= 


s y m f i nd ( $ 1 ) 




if ( ( j J = get t ype ( ) ) 1 = 


p rot 




j j 


! = cprot ) 


1 

1 


em i t ( ADR / i i ) ; > 
subsc r i pt head expression ' 


• 

9 




12 a */ 


- 


(ii = symfind(Sl); 
if ((jj = gettype()) == 


prot 


1 1 
1 1 


j j 


== cprot) 



emit(0PR/AR0); } 



constant; string /* 125*/ 

= {$$ = get va rc ( va r (i 1 ] ) ; } 

1 number /* 126 */ 

= {Si = $1 ; } 



to; 'to' /* 127 */ 

= {emit(ADR/ii = symf i nd( sp) ) ; 

emitlOPR/STD); emit(D£FfSDush(nsym++)); 
em i t ( V AL / i i ) ; ) 

9 

by; 'by' /* 128 */ 

- (emi t (OPR/ LEO) ; ii = spoo(); 
emi t (VAL/ spush(nsym++) )? 
em i t ( OPR / IRC ) ; emit(VAL/jj = nsym++); 



while; ' 
} 

XX 

» i nc 1 ude 



emi t lOPR/TRA) ; emi t (OEF»spush(nsyin + + ) ) 
spush(jj); soush(ii); > 

while' /* 129 */ 

- iem i t ( DEF , spush ( ns ym + + ) ) ; > 

/* beginning of programs section */ 

"m , scan . c *' 



las 



FILE: m . de f 

Macro Definitions 



f<de f i ne 
» d e f i n e 

* d e f i n e 
<#de fine 
^define 
?^de f i ne 

# d e f i n e 



true 1 
false 0 
Quote 39 

do«-forever while(l) 
unknown -l£?8 
EOF “1 

SIGN 0100000 



Pde fine 
de f i ne 
^define 
^define 
*>de f i ne 
"define 
#de f i ne 
#de f i ne 
"define 
"define 



e r rc 0 
i dent c 1 
numoc 2 . 
strinqc i 
specl ^ 
eo f c 5 
num8 6 
da t c on 8 
pr i con 9 
hashmask 127 



"de fine 
"define 
"de f i ne 
"define 



b i n V 2 
oc t V 8 
dec V 10 
he X V 16 



/* size of symbol table is symsmax + 1 */ 



"define 


symsmax 


"096 


"define 


m a X s y n o 


1023 


"define 


ma X 1 en 


16383 


"define 


va rc max 


127 


"define 


St ac kmax 2^ 


maximum number of 


^de fine 


mac ma X 


10 


A^de f i ne 


max b 1 k 


19 



/* syno field is 
/* length field i 
/* last location in 
/* top of parsing 
levels of macro nest 

/* na X i mum block 



10 bits * / 
s 1" bits */ 
varc */ 
stacks */ 
i ng */ 

nesting level */ 



"define foundv 2 
"define basev 1 



/* symbol table fields */ 



"de f i ne 
"define 
"define 
"define 
"define 



1 as t f 
t ype f 
s i z e f 
name f 
f i n f o 



/* fixed info in 'symbols' */ 



/* symbol table 
"define rest 15 



"define 
"de f i ne 
"define 
d e f i n e 



types */ 



undef 0 
mac t 1 
var t 2 
a r r t 3 



1^6 



<>cie fine 
»?c)e f i ne 
Pde fine 
fine 
rie f i ne 
<^de f i ne 
s*de f i ne 
f^de f i ne 
de f i ne 
«de f i ne 



s t r t 4 
1 ab t 5 
p ro t 6 
c va r t 7 
cprot 8 
i V a r t 9 
out Dt 10 
1 vec t 11 
s vec t 1 
c 1 ab t 13 



/ * Operators for "emit" */ 
tt de fine DEF 0 
»de f i ne ADR 1 
^define VAL 2. 

» define OPR 3 
# d e f i n e LIN 0 
J^define LIT t> 



/* Polish Operators for "emit" 





1 ne 


dOP 


1 


»de f 


1 ne 


ADO 


2 


»fdef 


1 ne 


SUri 


3 


»de f 


i ne 


ADC 


a 


Pde f 


i ne 


SbC 


3 


f^de f 


i ne 


MUL 


6 


«def 


i ne 


DIV 


7 


»de f 


i ne 


REM 


8 


fifdef 


1 ne 


NEG 


9 


^de f 


i ne 


AMO 


10 


fdef 


i ne 


lUR 


1 1 


#de f 


1 ne 


XOR 


12 


»def 


1 ne 


MOT 


13 


Adef 


i ne 


EOL 


la 


p>de f 


i ne 


LSS 


15 


«de f 


i ne 


GTR 


16 


»de f 


i ne 


NEO 


17 


?^de f 


i ne 


LEQ 


18 


Wdof 


i ne 


GEO 


19 


f^dei 


1 ne 


INX 


20 


«de f 


i ne 


TRA 


21 


«def 


1 ne 


TRC 


22 


<^de f 


i ne 


PRO 


23 


ftde f 


i ne 


F^E T 


2a 


^de f 


i ne 


STO 


25 


f^dei 


i ne 


STD 


26 


^def 


i ne 


XCH 


27 


sde f 


i ne 


DEL 


28 


ffde f 


i ne 


DAT 


29 


<^de f 


i ne 


LUO 


30 


Ade f 


i ne 


BIF 


31 


Woe f 


i ne 


IMC 


32 


de f 


i ne 


CSE 


33 


^de f 


i ne 


EMD 


3a 






1^7 



U6e fine 


EMB 


35 


d e f i n e 


ENP 


36 


» d e f 1 n e 


HAL 


37 


^de f i ne 


PTL 


38 


d e f i n e 


RTK 


39 


^define 


SFL 


40 


A^de f 1 ne 


SFR 


41 


^define 


HIV 


42 


^define 


LOV 


43 


d e f i n e 


CVA 


44 


^define 


UKG 


45 


s^de f i ne 


DRT 


46 


?^de f i ne 


EIM 


47 


^i^de f i ne 


DIS 


48 


A^de f i ne 


AXl 


49 


^6e fine 


AX2 


50 


^define 


AX3 


51 


^define 


ARG 


52 



mQ 



FILE: m , dec 1 

Global Variable Declarations 



int iifjj; char *kk,tt» 

int nsvrnj /* next symbol number */ 

/* npush is no. of symools pushed for current statement */ 
char npush; 

/* lanflag indicates if current statement has a label */ 
char 1 abf 1 ag; 

int acnt ; 
char con last; 

int yyline; /* line count */ 

int yydebug; /* debug switch */ 

/* comoiler toggles */ 
char togdf /* debug */ 

togpf /* production listing */ 

togt; /* token listing *■ / 

/* line limits for toggles */ 
int limlf /* lower limit */ 

limu; /* upper limit */ 

char token/ stype/ hashcode/ lastC/ nextc; 
int value; 
char errset; 

char *hent ry lhashmask + 1); 

/* 'symbol' is the base address of the currently referenced 
symbol table entry, 'sytop' is the current too of the 
symbol taole. 'tokrel' is used to hold 'symbol' for 
certain tokens during syntax analysis/ and eventually 
makes it to the 'symloc' stack corresponding to the ele- 
ment (if zero/ the token was either not looked up or not 
found) . */ 



char 

char 

char 



symbo I s (symsmax + 1); 

♦symbol /*sytop/*tokrel ; 



/♦ symbo 1 table */ 



i nt 



max sy / 
sy 1 as t ; 

sy re 1 / 
sy res ; 



/♦ 
/ * 



symbol */ 



/* 

/* 



min(25^/ ^symbol s Isymsmax] - 
last location filled during 
symbol table construction */ 
relative address of current symbol */ 
first symbol location after reserved words 



*/ 



/* token accumulation */ 

char V a r c [ V a r c ma x + 1 1 ; /* temporary character storage */ 

int varindex/ /* next free varc location */ 

tokindex/ /* start of accumulator in varc */ 

acclen; /* length of accumulated token */ 

/* parsing stacks */ 
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char h ash t s t ac KfTiax 1 “ 1 j ; hash code for entry ★/ 

int f i X V I s t ac kma X 1 1 ] / temporary use during parse */ 

va r [ s t ac kma X 1 1 ] ; /★ start location in varc for entry 

char *symloclstackmax+l]; symbol table location ^ / 

char sp/ stack pointer */ 

char ^csp; symbol number stack pointer 

/* mactop is the current top of the macro expansion stack, 
and, when mactoP is greater than zero^ *mac add r (mac t ool 
points to the current symbol string being expanded in the 
symbol table. the maclen table gives the number of 
characters remaining to expand at this level. 

char mactop, ^macaddr Imacmax + 1]; 
char naclenlmacmax f 1]; 

char macnexttmacmax t 11; /★ holds 'nextc’ for each level */ 

/★ block keeps track of the current symbol table top for 
each bloc< level. the variable blklev points to the 
current bloc< level in block. the value of curlev is 
blocktblklevl. blkv is a stack used for saving the 
value of npush at each level. 
char *blocktmaxblktl],*curlev; 
charblkv(rnaxblk + lJ; 
char blklev; 

/★ Duf is a structure used for buffering io 
struct bu f I 

int fildes; //file designator 
int numused; 

char *nxtfree; //buffer pointer 
char buff(5l£?]; //Sic? byte buffer 
} 



struct 


bu f 


bu f 1 ; 


//buffer 


for 


"plm. i . 1 •' 


St ruC t 


bu f 


bu f 2 / 


//buf f er 


for 


get c 


struct 


bu f 


DU f i ; 


//buffer 


for 


" p 1 m , i . V " 






i 






I 



i 






t 

i 

t 




FILt: m.act.c 

Procedures Invoked by Semantic Actions 



ftinclud^ "m.def" 
include ’’m.decl" 

symchec k ( i ) 

char i ; I 

if (symloclil >= curlev) 

{setsy(symloc [iJ ) } 
if (qettype() i= undef) 

If 1 ag ( " va r i ab 1 e r edec 1 a red" ) ; 
return (jj = true);} 

acnt--; set type (vart ) ; return (jj = false); 

> 

return(jj = true); ) 

symfind(i) 

char i ; ( 

if (i < U) return(-l); 
if (symloctiJ " symbols) 

{ set sy ( sym 1 oc t i I ) ; 
switch(gettype()) 

(case vart: case cvart: 
case prot: case corot: 
case Ivect: case svect; 

case ivart: case outpt: return(getsyno( ) ) ; 

break; 

default; 

flag(" identifier cannot be a variable") 
return(-l); ) ) 

else 

{ f 1 ag ( " undec 1 ared variable"); return(“l);} > 

emit(al/a2) 

char al ; i nt a^; ( 

i f (er rset ) return; 
switch(al) { 

case LIN; a2 = (a2 & 017777) | OaOOOO; break; 

case LIT: putcCal << i*/&bufl); break; 
default; ad = (a2 S. 007777) ! (al << 12); > 

putc(high(a2)/S,bufl); 
pu t c ( 1 ow ( a2 ) f ibu f 1 ) ; ) 

/* note that the snush and spop routines operate on 
'cstack'f which is actually an area at the top 
ofthesymboltable. */ 

spush ( sn ) 

/* push a symbol number onto cstack */ 
i n t sn; ( 

if (csp <= sytop + 1 ) 

{tflag(" cstack overflow");} 
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*(--csd) = high(sn); 

*(“-csu) - low(sn); 

return(sn); > 

spop() { 

/* pop a symool number from cstack */ 

char i ; 

if (csD >= Ksymbolsfsymsmaxl) 

{flag(" cstack underflow"); 
return(-l); > 
i = *(csp++); 

re t u r n ( ma ke t wo C i ^ * ( c sp-f + ) ) ) ; > 

procode(sy) 

/* emit code for a <PROCEDURE H£AD> */ 
i n t G y ; { 

emit(VALfSpushlnsym+t)); 
emi t (OPi^fTRA) ; 
set sy ( sy ) ; 

em i t ( DE F f qe t s y no ( ) ) ; ) 

concode ( v f t ) 

/* emit code for constants */ 
int v; char t; { 
char i / i ; 

if (conlast == stringc) 

{for (i = l; i <= v; i++) 

{j = getvarc(i + var(spJ); 
if (t < datcon) putc ( i f S.buf 3) ; 
else emit(LlT/j); 

> 

POP (i); return(v); 

) 

/* constant is a number */ 
if (t >= datcon) 

{emit(LllfV); return((v > 255 I! v < 0) + 1)/ ) 

/* initial constant */ 
switch (t) { 

case 0; return(O); 
case 1: out c ( v / Ebu f 3 ) ; 

if (conlast != num8) 

flag(" single byte constant required"); 
return ( 1 ) ; 

case 2: putw(v/ &buf3) ; 
return(2) ; > ) 
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FILE! m.aux.c 
Auxiliary Procedures 



»incluae "m.def" 

»include "’n.deci" 

char high(n) /* return high byte of an integer */ 

i n t n ; { 

return (n >> 8); ) 

char low(n) /* return low byte of an integer */ 

i n t n ; { 

return (n) ; > 

intmaketwo(ifj) 

/* return 16 bit value constructed from i and j */ 

char i f j ! i 

return ( U << 6) \ (i & 0377) ) ; > 

i n t no rm ( i ) 

Chari; ( 

/* ensures that cnars with msb = 1 

are converted to integers in the 
range (It?8f2b5) rather than to 
negative integers */ 

return (i < 0 ? 256 + i : i); } 



push ( i ) 

/* stac< the last token in varc */ 
char i ; { 

npush++ ; 

if (++sp > stackmax) 

{flog(" stack overflow"); so = O; npush = 0;} 
varlsp] = tokindex; 
varc [tok index] = acclen; 
tokindex =+ acclen + l; 

/* varc is ready for another token */ 
f i X V [ s p ) = i ; 

hash [sp] = hashcode; 
svm1oc[soJ=tokrel; > 

pop ( n ) 

/* remove n tokens from the stacks */ 

char n ; { 

if (laoflag) {n =+ labflag; labflag = 0;] 
for (; n > O; n-”) 

{npush--; 
if ( sp < 0 ) 

{flag(" stack underflow"); 
sp = -l; npush = 0; return;} 
tokindex =- varc(var(sp--J) + i; 

> > 
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FILE: m.err.c 
Error Rout i nes 



include ''m.def" 

« i nc 1 ude " m . dec 1 " 

f 1 aq (e r r ) 

char *err; { 

Drintf(''\nco'npi)e errorf line %d : %s\n"/ 

t f 1 ag(err) 

char *err; 

{ f 1 ag ( e r r ) ; 
e X i t ( ) / } 

errfix() { 

/* procedure for error recovery following 
discovery of a syntax error on input 
errset = true; 
pop ( npush ) ; > 

undec ( ) { 

/* check for undeclared variables */ 
if(fixvlso] i=foundv) 

<flag(" variable undeclared"); 
enter(so/undef/Of 1 ) ; 
setsyfsytop *■ *sytop); 
symloclsp] = symbol; 
fixncol 1 (); 

} ) 



I 5/1 



yyline^err); > 



* / 



FILE: m.scan.c 

Lexical Analysis Routines 



/* lexical analysis */ 



char qetvarcCi) 
char i ; 

{return ( va rc (norm ( i ) 1 ) ; > 
char gnc() 

{ /* get next incut char */ 
char i; int j ! 
while (true) 

{if ( ( ( i=getc (&buf2) ) != ’\r’) && 

(i i= '\n')) 
ret urn ( i ) ; 

else 

{if (togd) 

{if (yyline == liml) yydebug = true) 
if (yyline == limu + 1) yydebug = false; 

> 

emit(LlN/+f-yyl ine); 

} 

} } 

char readinp() { 

char c ; 

if (mactop > 0) /* then expanding a macro */ 

{ i f ( 1 ( ”"mac 1 en [mac top) ) ) /* maclen == 0 */ 

/* then end of macro expansion! restore nextc */ 
return (macnex t (--mac t op) ) / 

/* otherwise continue expansion */ 
return ( * ( + -t-mac addr (mac t op) ) ) ; 

} 

/* otherwise read from input device */ 
return (gnc()); 

} 



zeroacc ( ) 

{ /* zero accum parameters */ 
stype = hashcoae - acclen = value = 0; } 

saver ( ) 

{ /* save characters in the accumulator/ and compute 
the hashcode */ 
int i ; 

hashcode = (hashcode + nextc) & hashmask; 
if ((i = ++acclen + tokindex) >- varcmax) 

{ f 1 agCvo”) ; 
acclen=0; } 
else varcli) = nextc) > 



15S 



char numeric() 

{ /* return true if nextc is numeric */ 
return ( nor m ( ne x t c ' 0 ' ) <= 9); } 

char hex ( ) 

{ /* return true if nextc is hexadecimal */ 

return(numeric()IS(norm(nextC"'a*) <= 5)); } 

char letter() 

{ /* return true if nextc is a letter */ 
return ( no rm ( nex t c ~ ' a ' ) <= 25); } 

char alphanumt) 

{ / *■ return true if nextc is alphanumeric */ 
return (numeric() SS letterC)); ) 



qettoken() {/* get tokens for the parser */ 
char b/ d/ i f neg; 
i n t V ; 

zeroacc ( ) ; 



{ /* find initial character */ 

{token = O; 
while (token == 0) 

{ /* deblank input */ 

while (( ne X t c = = un known ) !! (nextc = =' ') 

I I ■ (nextc = ='\t ' ) ) 
nextc = readinp(); 

/* check symbol class */ 

if (letter()) token = identcP else 
if (numeric()) to<en = numbc; else 
i f (nextc == quote) 

{token = strinpc; nextc = unknown)} else 
/* this must be a special char */ 

{ 1 a s t c = ne X t c ; saver(); nextc = unknown) 
if ( 1 astc== * / ' ) 

{ /* may be a comment */ 
if ((nextc=readinp())=='*') 

{while (l(((nextc=readinp())=='/') 
8.8. (lastc == '*'))) lastc = nextc) 
nex t c =un known ; zeroacc())} 
else /* just a / */ token = specl)} 

else 

if ( 1 ast c = = E(JF ) token = eofc) 
else token=specl) 
if (token 1= 0) return)}} 

/* end of checks. for symbol class */} 

/* end of check for token = 0 */ } 
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/* symbol tyoe discoveredr scan remainder */ 
while (true) 

(if (nextc != unknown) saver()J 
la?tc=nextc/ nextc = readinp()J 

if (token == identc) 

(if (nextc == '$') nextc = unknown) else 
if ( 1 a 1 ph anum ( ) ) return?) 

else 

if (token == numbc) 

(if (nextc == '$') nextc = unknown) else 
if (ihex()) /* end of number found */ 

(if ( ( nex t c == ' o ' ) I J ( nex t c == ' q ' ) ) stype=octv) 
else 

if (nextc=='h*) styoe = hexv) 

if (stype > 0) nextc = unknown) 
else 

if (lastc=='b') 

(--acclen) styoe=binv)> 

else 

if ( 1 ast c = = ' d ' ) 

(--acclen) stype=decv)> 
else stype=decv) 

/* now convert number to binary */ 
value = 0) neg = false) 
for (i=l) i<=acc1en) i++) 

(if ( ( d=ge t va r c ( i + tokindex)) >= 'a') 

d=d-'a'+10) else d=d-'0') 



i f ( (b = st ype) <= d) 


token = 


e r rc ) 


V = value) value - 
while (b =>> 1) 


d) 




(if (v & SIGN) 

V =<< 1) 

if (b & 1 ) 


token = 


e r rc ) 


(if ((value 


! v) & 


SIGN) 


neq = 


true) 




value =+ V ) 
if (neq && 


! (value 


& SIGN)) 


token = 


e r rc ) 





) 

} 



) 

/* binary equivalent is in 'value' */ 
ret urn) ) > 



else 

if (token==strinqc) 

(if (next c = = quo t e ) 

(if ( ( ne X t c = r ead i np ( ) ) 1 =quot e ) return))) 

) 
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} 



prnt ok ( ) { 

/* orint token info */ 
i nt i ; 

PutcharC *\n' ); 

for (i = i; i <= acclen; i++) 
putcharfvarc [tokindex + ij ); 
printf("\nt=Xd s=%d l=%d v=%l h=%d"» 

toKen/Styoe/acclen^value/hashcode); 
putchar(‘\n'); > 

vy 1 e X ( ) { 

/* lexical analyzer -- interface between 

yyparse and gettoken */ 

char i ; 

tok re 1 = symbo 1 s / 
ao<-forever { 

gettokent); 

if (togt SrX yyline >= limi && yyline <= limu) 
prntokt); 
switch (token) { 
case eofc: 

return ('\0')? 
case spec 1 : 

return (lastc); 
case stringct 
push ( 0 ) ; 

coni as t = stringc; 
yy 1 va 1 = so; 

return (string); 
case numoc; 

conlast = ( h i gh ( va 1 ue ) ) ? numbc : numd; 
yylval = value; 
return ( numbe r ) ; 
case identc; 

I ookup ( ) ; 
tokrel = symbol; 
if (found()) 

switch (gettype()) ( 
case rest : 

return (getresno() + 2S6); 
case mac t ; 

/* start macro expansion */ 

/* save lookahead cnaracter for 
restoration following the 
macro expansion */ 

macne X t (mac t op) = nextc; 
nextc = unknown; 
if (++mactop > macmax) 

(mactoD = O; f I ag ( "md" ) ; > 

/* set up definition */ 

mac I en (mac t op) = getmsize() + 1; 

macaddr (mac t op) = getmdef(); 
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break; 

default: 

push ( f oundv ) i 



-- 'e* 
-- 'o’ 



yylval = so; 

re t u rn ( i den t 1 f i e r ) ; > /* end of 

else if (acclen - = 3 

&& get va rc ( t ok i ndex + 1 ) == 

(i!i ge t var c ( t ok i ndex t2 ) = = 

&& ge t va r c ( t ok i nde X t 3 ) == 'f') 

return ('\0'); /* eof */ 
else { /* unknown identifier */ 

pus h ( 0 ) ; 
yylval = sp; 
return (identifier); } 

/* end of unknown */ 
break; /* end of case identc */ 

case er rc ; 

flag(" number conversion error"); 
yylval = value; 
return (number); 

> /* end of sw i t c h ( t oken ) */ 

) f *■ end of do<-forever */ 

> /* end of yylex() */ 



if(found()) */ 
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FILE: m.sym.c 

Symbol Table Routines 



^include "m.def" 
r<include ''m.decl'* 

setsy(a) 

char *a; { 

i n t i ; 

/* set symbol to point to symbolsla “ symbols) */ 
symbol = a; 

syrel ~ symbol ” symbols? 

/* set maxsy so no overflow can occur 
when filling the symbol table */ 
if (highti = csp - 1 - symbol) == 0) 
maxsy = low(i) & 0376? else 
maxsy = 25^? 

/* note that maxsy <= 25^ */ ) 



/* the getxxx procedures which follow assume 
that symbol is set to the base of the 
currently referenced symbol table entry */ 



char qptsym(i) 

Chari; { 

return ( symoo 1 1 nor m ( i ) ) ) ? ) 



char getlast() i 

/* get the value of the 'last' field */ 
return (get sym ( 1 ast f ) ) ; ) 



char qettypeC) ( 

/* get the value of the 'type' field */ 
return ( ge t s ym ( t ype f ) & 017); } 



char getsizeC) ( 

/* get the value of the 'size' field */ 
return ( ge t sym ( s i z e f ) ) ; > 



char getname(i) 

Chari; ( 

/* get character i of the 'name' field */ 
return ( get sym (norm ( i ) + finfo))? ) 



int qethcolH) ( 

/* get the hash collision field */ 
char i ; 

i = getsize() + finfo ” 2; 
return (maketwo(qetsym(i)/ 
getsyrn(i + I)));) 

char get resno ( ) { 
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1 

I 



I 



I 



I 




/* get the reserved word numOer */ 
return ( get sym (get s i ze ( ) t finfo)); ) 

char getmsize() { 

/* get macro size */ 

return ( get sym (get s i ze ( ) + finfo))» > 

char *ge t mrle f ( ) ( 

/* get the absolute address of 
macro definition base -1 */ 
return ( no rm ( ge t s i ze ( ) ) + finfo + symbol)? ) 

i nt get syno ( ) ( 

/* get the symbol number */ 

/* assumes 10 bit field */ 
char i ; 

i = qetsize() + finfo; 

return ( maket wo (get sym ( i ) ^ getsym(i+l) & 03)); } 

char qetprec ( ) ( 

return ( ( get sym ( t ype f ) & 0160) >> H)} > 

char getbased() { 

/* get the based variable field */ 
return (getsyrn(typef) < 0); ) 

i nt getbsyno( ) ( 

/* get the bsyno field */ 

/* assumes a 10 bit field */ 
char i ; 

i = getsize() + finfo + 2; 

return ( make t wo ( get sym ( i ) / ge t sym ( i + 1 ) & 03)); ) 

int getlen() { 

/* get tne length field */ 

/* assumes a 6 bit (short) or 1^ bit (long) field */ 
char i ; int 1 ; 

i = qetsize() + finfo + (getbased() ? 3 : 1); 

1 = norm(qetsym(i)) >> 2? 

return ((gettyoe() == Ivect) ? 

(norm(getsyin(i + l)) << 6) i 1 : 1); } 



/* tne setxxx orocedures which follow assume 
that symbol is set to the base of the 
currently referenced symbol table entry */ 



setsym(i/x) 

char i f X ; ( 

if (norm(sylast = i) > 
symbol (norm(i)J = x? 

sett ype ( t ) 

chart; { 



norm(maxsy)) tflag("to"); 
} 
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se t s viT) ( t ype f ^ ( ge t sy m ( t ype f ) & 0360) I t)» > 

setsi?e(s) 

chars; { 

setsym(sizef^s); } 

sethcol 1 (he) 

i nt he ; { 

char i ; 

setsynCCi = getsizeC) + finfo - 2)/low(hc)); 
setsym(i f l»high(hc)); > 

set resnoC i ) 

/* set reserved word number */ 
char i ; 

{ set sym ( f 1 n f o + ge t s i ze ( ) ^ i ) ; > 
setsyno() { 

/* set the symbol number field */ 

/* assumes 10 bit field */ 
cnar i ; 

if (nsym > maxsyno) tflag(“too many symbols”); 
setsym((i = getsize() + finfo)r low(nsym)); 
setsym(i + 1/ high(nsym++) J (getsym(i+l) & 037^)); 
return(nsym - 1); } 

setprec (p) 

/* set the precision field */ 
char p; { 

se t sym ( t ype f » ( ge t sym ( t ype f ) & 0217) ! (p << ^))} > 

setbased(b) 

/* set the based variable field */ 
char b; { 

se t s ym ( t ype f »( ge t sym ( type f ) & 0177) I (b ? 0200 ; 0)); > 
setDsyno ( i ) 

/* set the bsyno field of a based variable 

entry to the symbol number of the base */ 

/* assumes a 10 bit field */ 
i n t i ; { 

char j ; 

setsym((j = getsizef) + finfo + 2)> low(i)); 
setsymCj + 1/ high(i) ! (getsym(j+l) & 037^)); > 

set 1 en ( 1 ) 

/* set the length field */ 

/* assumes a bit (long) field */ 
i n t 1 ; ( 

char i ; 

if (I > maxlen) flag(” vector length too large"); 

/* t>ased field must have been set already */ 

1 = getsize() + finfo + (getbased() ? 3 ; 1); 

setsym(i» (low(l) << 2) I (getsym(i) & 03)); 
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setsym(i+l/ 1 >> 6); } 



i nt found ( ) < 

/* returns true if symbol does not address 
the oase of the 'symbols' vector */ 
return (syrel); > 

1 ookup ( ) { 

/* look for accumulator match in symbols 
based upon current value of hashcode */ 
char i ^ * j ; 

/* 'symbol' is set to the top-most symbol 
withthishashcode */ 

setsy((j = hent r y (hashcodeJ ) ? j : symbols); 

/* the value of the 'found' procedure is false 

if the symbol name cannot be found in the table */ 
while (foundC)) 

</* 'symbol' points to possible match in table */ 
if (qetsizeC) == acclen + 2) /* then length match */ 

for (i = 0; qetname(i) == get va re ( i + I + t ok i ndex ) ; ) 
if (f+i == acclen) return; 

/* no match^ so look again */ 

Set sy (get hcol 1 ( ) + symbols); 

/* 'symool' is now set to the next symbol 
with this hash code */ 

} 

> 



setcharCsl /vl ) 

/* place characters from varc into symbol table 
starting at vl in varc and si in symbol. the 
length of the transfer is obtained from varc(vl). */ 
char s 1 / V 1 ; { 

char i ; 

i = getvarc(vl); 
while ( i ) 

(setsymCsl »getvarc(++vl ) ); 
s 1 + + ; i - - ; 

} ) 

setname(s) 

/* set size/ name/ and hcoll fields 

fromvarats */ 

chars; { 

char k / 
setsy(sytop) ; 

set s i ze ( get varc ( k = var (s) ) + 2); 

setcharCna.mef/k); 

/* temporarily store hashcode in hcoll field */ 
set hco 1 1 ( hash (sj ) ; ) 
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f i X hco n ( ) { 

/* fix the hash chains using the hashcode 
value stored in the hcoll field */ 

/* assumes symool has been set */ 
char i nt i ; 

sethcoll((p = hentryfi = gethcoll()J) > 0 ? 

( p - symbo Is) : 0 ) ; 
hentryiiJ = symbol; } 

sy f i n ( ) { 

/* finish construction of a symbol table entry/ 
assuming the highest field in the entry was 
filled last (thus setting sylast). */ 

/* note that sylast <= 25^ */ 
setsy(sytop =+ (++sylast)); 

/* now addressing next symbol table entry */ 
set sym ( 1 as t f / sy 1 ast ) ; } 

en t e rb 1 k ( ) ( 

/* enter a new olock level */ 
if (++olklev > maxblk) 

{flagC'bo"); blklev = 2;> 

Dlkv(blklev) = npush; npush = 0; 
curlev = b 1 oc k (b 1 k 1 e v) = sytop; } 

ex i tbl k ( ) { 

/* exit current block level */ 
char hfj/i; char *d; 

/* remove innermost symbol table entries */ 
if (--blklev < 1) blklev = i; 
se t s y ( s y t OP ) ; p = sytop; 
while (p > curlev) 

(d =- norm(qet 1 ast ( ) ) ; 
set s y ( p ) ; 

/* entry removed; fix hash entry/ if necessary */ 
if (i = getsize()) /* > 0 then recompute hashcode */ 

(h = o; 

for (j = Of norm(--i) > 1; j+t ) 

h = (h + getname(j)) & hashmask; 
hentryCh] = qethcolH) + symbols; 

> 

> 

/* remove any currently expanding macros */ 
while ( (macaddr (mac top) > p) && mactop > 0) 

--mac t OP ; 

/* reset current level */ 
npush = blkv(blklev); 
curlev = b I oc k Id 1 k I ev) ; } 

enter (pt r / t / P/ 1 ) 

/* maxe an entry in the symbol table */ 
charptr/t/p; int 1; ( 

setsy(sytop) ; 
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*/ 



sett yoe ( t ) ; 
setprec (o) ; 
if (otr >= Q ) 

isetbasec((fixv[ptr] ); 

Set name (pt r ) ; > 

S e t s i 2 e ( 0 ) ; ) 
set syno ( ) ; 

setflnn/r"'^ """ basev) setbSyno(O); 
s y f i n ( ) ; } 

chanpe(t, p, |, n) 

LP3! 

i n t I , n ; { 

setsy (sytop) ; 

ior (; n > O; n--) 

^rt'yoea^r' ■ = 

selpJecIJ;"'' ' ' ' 

set lend ); 

> 

se tsy (sytop) ; } 

compress (pt r , n) 

^ remove the second byte of t-ho i 

from n svmhr.i ^=^.l length 

char *otr; int n; 3 starting 

)nt char *p; 

if (In) return/ 
setsy(pt r); 

fixhcollTL^ Chains for first entry */ 

n ’ (.e<..se,o 7 , : a,; 

(setsy(ptrfi ); 

— * (getbasedO ? 3 ; 1 ); 

for (j -Or i <= i;; j + f) 

Pt r ( j J = symbo 1 I j ] ; 

so fix hashcode chains 
setsy(ptr); 
f i xhco I I ( ) ; 
p r = + k + 1 ; 

Ptr(0J r ptrfnj ~ 1; 
s y t op = otr; 

setsy(sytop) ; ) 



field 
et pt r 



*/ 



for 



pos i t i on, 
*/ 
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